Abstract
In data-driven inverse optimization an observer aims to learn the preferences of an agent who solves a parametric optimization problem depending on an exogenous signal. Thus, the observer seeks the agent’s objective function that best explains a historical sequence of signals and corresponding optimal actions. We focus here on situations where the observer has imperfect information, that is, where the agent’s true objective function is not contained in the search space of candidate objectives, where the agent suffers from bounded rationality or implementation errors, or where the observed signal-response pairs are corrupted by measurement noise. We formalize this inverse optimization problem as a distributionally robust program minimizing the worst-case risk that the predicted decision (i.e., the decision implied by a particular candidate objective) differs from the agent’s actual response to a random signal. We show that our framework offers rigorous out-of-sample guarantees for different loss functions used to measure prediction errors and that the emerging inverse optimization problems can be exactly reformulated as (or safely approximated by) tractable convex programs when a new suboptimality loss function is used. We show through extensive numerical tests that the proposed distributionally robust approach to inverse optimization attains often better out-of-sample performance than the state-of-the-art approaches.
Similar content being viewed by others
Notes
A possible choice is \(\beta _N=\exp (-\sqrt{N})\).
We did not consider the ERM approach in Sect. 5 because it coincides with the VI approach for linear hypotheses.
References
Ackerberg, D., Benkard, C.L., Berry, S., Pakes, A.: Econometric tools for analyzing market outcomes. In: Heckman, J., Leamer, E. (eds.) Handbook of Econometrics, pp. 4171–4276. Elsevier, Amsterdam (2007)
Ahmed, S., Guan, Y.: The inverse optimal value problem. Math. Program. A 102, 91–110 (2005)
Ahuja, R.K., Orlin, J.B.: A faster algorithm for the inverse spanning tree problem. J. Algorithms 34, 177–193 (2000)
Ahuja, R.K., Orlin, J.B.: Inverse optimization. Oper. Res. 49, 771–783 (2001)
Anderson, B.D., Moore, J.B.: Optimal Control: Linear Quadratic Methods. Courier Corporation, North Chelmsford (2007)
Aswani, A., Shen, Z.-J.M., Siddiq, A.: Inverse optimization with noisy data. Preprint arXiv:1507.03266 (2015)
Bajari, P., Benkard, C.L., Levin, J.: Estimating dynamic models of imperfect competition. Technical Report, National Bureau of Economic Research (2004)
Ben-Ayed, O., Blair, C.E.: Computational difficulties of bilevel linear programming. Oper. Res. 38, 556–560 (1990)
Ben-Tal, A., Ghaoui, El, Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)
Bertsimas, D., Gupta, V., Paschalidis, I.C.: Inverse optimization: a new perspective on the Black–Litterman model. Oper. Res. 60, 1389–1403 (2012)
Bertsimas, D., Gupta, V., Paschalidis, I.C.: Data-driven estimation in equilibrium using inverse optimization. Math. Program. 153, 595–633 (2015)
Boissard, E.: Simple bounds for convergence of empirical and occupation measures in 1-Wasserstein distance. Electron. J. Probab. 16, 2296–2333 (2011)
Bottasso, C.L., Prilutsky, B.I., Croce, A., Imberti, E., Sartirana, S.: A numerical procedure for inferring from experimental data the optimization cost functions using a multibody model of the neuro-musculoskeletal system. Multibody Syst. Dyn. 16, 123–154 (2006)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2009)
Braess, D., Nagurney, A., Wakolbinger, T.: On a paradox of traffic planning. Transp. Sci. 39, 446–450 (2005)
Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8, 231–357 (2015)
Burton, D., Toint, P.L.: On an instance of the inverse shortest paths problem. Math. Program. 53, 45–61 (1992)
Carr, S., Lovejoy, W.: The inverse newsvendor problem: choosing an optimal demand portfolio for capacitated resources. Manag. Sci. 46, 912–927 (2000)
Chan, T.C., Craig, T., Lee, T., Sharpe, M.B.: Generalized inverse multiobjective optimization with application to cancer therapy. Oper. Res. 62, 680–695 (2014)
Faragó, A., Szentesi, Á., Szviatovszki, B.: Inverse optimization in high-speed networks. Discrete Appl. Math. 129, 83–98 (2003)
Fournier, N., Guillin, A.: On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 162, 707–738 (2014)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, Berlin (2001)
Harker, P., Pang, J.: Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications. Math. Program. 48, 161–220 (1990)
Heuberger, C.: Inverse combinatorial optimization: a survey on problems, methods, and results. J. Comb. Optim. 8, 329–361 (2004)
Hochbaum, D.S.: Efficient algorithms for the inverse spanning-tree problem. Oper. Res. 51, 785–797 (2003)
Iyengar, G., Kang, W.: Inverse conic programming with applications. Oper. Res. Lett. 33, 319–330 (2005)
Keshavarz, A., Wang, Y., Boyd, S.: Imputing a convex objective function. In: IEEE International Symposium on Intelligent Control, pp. 613–619 (2011)
Lofberg, J.: YALMIP: a toolbox for modeling and optimization in MATLAB. In: IEEE International Symposium on Computer Aided Control Systems Design, pp. 284 –289 (2004)
Markowitz, H .M.: Portfolio Selection: Efficient Diversification of Investments. Yale University Press, New Haven (1968)
Mohajerin Esfahani, P., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. (2017). https://doi.org/10.1007/s10107-017-1172-1
Murty, K.G., Kabadi, S.N.: Some NP-complete problems in quadratic and nonlinear programming. Math. Program. 39, 117–129 (1987)
Neumann-Denzau, G., Behrens, J.: Inversion of seismic data using tomographical reconstruction techniques for investigations of laterally inhomogeneous media. Geophys. J. Int. 79, 305–315 (1984)
Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
Saez-Gallego, J., Morales, J.M., Zugno, M., Madsen, H.: A data-driven bidding model for a cluster of price-responsive consumers of electricity. IEEE Trans. Power Syst. 31, 5001–5011 (2016)
Schaefer, A.J.: Inverse integer programming. Optim. Lett. 3, 483–489 (2009)
Shafieezadeh-Abadeh, S., Kuhn, D., Mohajerin Esfahani, P.: Regularization via mass transportation. Preprint arXiv:1710.10016 (2017)
Shafieezadeh-Abadeh, S., Esfahani, P.M., Kuhn, D.: Distributionally robust logistic regression. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 1576–1584. Curran Associates, Inc. (2015)
Sion, M.: On general minimax theorems. Pac. J. Math. 8, 171–176 (1958)
Terekhov, A.V., Pesin, Y.B., Niu, X., Latash, M.L., Zatsiorsky, V.M.: An analytical approach to the problem of inverse optimization with additive objective functions: an application to human prehension. J. Math. Biol. 61, 423–453 (2010)
Troutt, M.D., Pang, W.-K., Hou, S.-H.: Behavioral estimation of mathematical programming objective function coefficients. Manag. Sci. 52, 422–434 (2006)
Wang, L.: Cutting plane algorithms for the inverse mixed integer linear programming problem. Oper. Res. Lett. 37, 114–116 (2009)
Woodhouse, J.H., Dziewonski, A.M.: Mapping the upper mantle: three-dimensional modeling of earth structure by inversion of seismic waveforms. J. Geophys. Res. 89, 5953–5986 (1984)
Zhang, J., Xu, C.: Inverse optimization for linearly constrained convex separable programming problems. Eur. J. Oper. Res. 200, 671–679 (2010)
Acknowledgements
This work was supported by the Swiss National Science Foundation grant BSCGI0_157733.
Author information
Authors and Affiliations
Corresponding author
Appendix A. Convex hypotheses
Appendix A. Convex hypotheses
Consider the class \(\mathcal {F}\) of hypotheses of the form \(F_\theta (s,x):=\big \langle \theta , \Psi (x) \big \rangle \), where each component function of the feature map \(\Psi :{\mathbb {R}}^n\rightarrow {\mathbb {R}}^d\) is convex, and where the weight vector \(\theta \) ranges over a convex closed search space \(\Theta \subseteq {\mathbb {R}}^d_+\). Thus, by construction, \(F_\theta (s,x)\) is convex in x for every fixed \(\theta \in \Theta \). In the remainder we will assume without much loss of generality that the transformation from the signal-response pair (s, x) to the signal-feature pair \((s,\Psi (x))\) is Lipschitz continuous with Lipschitz modulus 1, that is, we require
Before devising a safe conic approximation for the inverse optimization problem (13) with convex hypotheses, we recall that the conjugate of a function \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) is defined through \(f^*(z)=\sup _{x\in {\mathbb {R}}^n} \big \langle z, x \big \rangle -f(x)\).
Theorem A.1
(Convex hypotheses and suboptimality loss) Assume that \(\mathcal {F}\) represents the class of convex hypotheses induced by the feature map \(\Psi \) and with a convex closed search space \(\Theta \subseteq {\mathbb {R}}_+^d\) and that Assumption 5.1 holds. If the observer uses the suboptimality loss (4b) and measures risk using the \(\mathrm{CVaR}\) at level \(\alpha \in (0,1]\), then the following convex program provides a safe approximation for the distributionally robust inverse optimization problem (13) over the 1-Wasserstein ball:
Note that Theorem A.1 remains valid if \((\widehat{s}_i, \widehat{x}_i)\notin \Xi \) for some \(i\le N\).
Proof of Theorem A.1
As in the proof of Theorem 5.2 one can show that the objective function of the inverse optimization problem (13) is equivalent to (19). In the remainder, we derive a safe conic approximation for the (intractable) subordinate worst-case expectation problem
To this end, note that the suboptimality loss \(\ell _\theta (s,x) = \big \langle \theta , \Psi (x) \big \rangle - \min _{y \in {\mathbb {X}}(s)} \big \langle \theta , \Psi (y) \big \rangle \) depends on x only through \(\Psi (x)\). This motivates us to define a lifted suboptimality loss \(\ell ^\Psi _\theta (s,\psi ) = \big \langle \theta , \psi \big \rangle - \min _{y \in {\mathbb {X}}(s)} \big \langle \theta , \Psi (y) \big \rangle \), where \(\psi \) represents an element of the feature space \({\mathbb {R}}^d\), and the empirical distribution \(\widehat{\mathbb {P}}_N^\Psi = \frac{1}{N}\sum _{i=1}^N \delta _{(\widehat{s}_i, \Psi (\widehat{x}_i))}\) of the signal-feature pairs. Moreover, we denote by \({\mathbb {B}}^1_{\varepsilon } (\widehat{\mathbb {P}}_N^\Psi )\) the 1-Wasserstein ball of all distributions on \({\mathbb {S}} \times {\mathbb {R}}^d\) that have a distance of at most \(\varepsilon \) from \(\widehat{\mathbb {P}}_N^\Psi \). In the following we will show that the worst-case expectation
on the signal-feature space \({\mathbb {S}} \times {\mathbb {R}}^d\) provides a tractable upper bound on the worst-case expectation (36). By Definition 4.3, each distribution \(\mathbb {Q}\in {\mathbb {B}}^1_{\varepsilon } (\widehat{\mathbb {P}}_N)\) corresponds to a transportation plan \(\Pi \), that is, a joint distribution of two signal-response pairs (s, x) and \((s',x')\) under which (s, x) has marginal distribution \(\mathbb {Q}\) and \((s',x')\) has marginal distribution \(\widehat{\mathbb {P}}_N\). By the law of total probability, the transportation plan can be expressed as \(\Pi =\frac{1}{N}\sum _{i=1}^N \delta _{(\widehat{s}_i,\widehat{x}_i)}\otimes \mathbb {Q}_i\), where \(\mathbb {Q}_i\) denotes the conditional distribution of (s, x) given \((s',x')=(\widehat{s}_i,\widehat{x}_i)\), \(i\le N\), see also [30, Theorem 4.2]. Thus, we the worst-case expectation (36) satisfies
where the first inequality follows from (34), while the second inequality follows from relaxing the implicit condition that the signal-feature pair \((s,\psi )\) must be supported on \(\{(s,\Psi (x)):(s,x)\in \Xi \}\subseteq \mathbb {S}\times {\mathbb {R}}^d\). Using a similar reasoning as before, the last expression is readily recognized as the worst-case expectation (37). Thus, (37) provides indeed an upper bound on (36). Duality arguments borrowed from [30, Theorem 4.2] imply that the infinite-dimensional linear program (37) admits a strong dual of the form
By using the definitions of \(\mathbb S\) and \(\mathbb {X}(s)\) put forth in Assumption 5.1, the i-th member of the first constraint group in (38) is satisfied if and only if the optimal value of the maximization problem
does not exceed \(r_i\). A tedious but routine calculation shows that the dual of (39) can be represented as
Note that the perspective functions \(\theta _j \Psi ^*_j(z_{ij}/\theta _j)\) in the objective of (40) emerge from taking the conjugate of \(\theta _j \Psi _j(y)\). Thus, for \(\theta _j=0\) we must interpret \(\theta _j \Psi ^*_j(z_{ij}/\theta _j)\) as an indicator function in \(z_{ij}\) which vanishes for \(z_{ij}=0\) and equals \(\infty \) otherwise. By weak duality, (40) provides an upper bound on (39). We conclude that the i-th member of the first constraint group in (38) is satisfied whenever the dual problem (40) has a feasible solution whose objective value does not exceed \(r_i\). A similar reasoning shows that the i-th member of the second constraint group in (38) holds if and only if there exists \(\phi _{i2} \in \mathcal {C}^*\) such that
Thus, the worst-case expectation (36) is bounded above by the optimal value of the finite convex program
The claim then follows by substituting this convex program into (19).
Rights and permissions
About this article
Cite this article
Mohajerin Esfahani, P., Shafieezadeh-Abadeh, S., Hanasusanto, G.A. et al. Data-driven inverse optimization with imperfect information. Math. Program. 167, 191–234 (2018). https://doi.org/10.1007/s10107-017-1216-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-017-1216-6