Strong mixed-integer programming formulations for trained neural networks

5808 Accesses
116 Citations
Explore all metrics

Abstract

We present strong mixed-integer programming (MIP) formulations for high-dimensional piecewise linear functions that correspond to trained neural networks. These formulations can be used for a number of important tasks, such as verifying that an image classification network is robust to adversarial inputs, or solving decision problems where the objective function is a machine learning model. We present a generic framework, which may be of independent interest, that provides a way to construct sharp or ideal formulations for the maximum of d affine functions over arbitrary polyhedral input domains. We apply this result to derive MIP formulations for a number of the most popular nonlinear operations (e.g. ReLU and max pooling) that are strictly stronger than other approaches from the literature. We corroborate this computationally, showing that our formulations are able to offer substantial improvements in solve time on verification tasks for image classification networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep neural networks and mixed integer linear optimization

Article 26 April 2018

Strong Mixed-Integer Programming Formulations for Trained Neural Networks

Optimization over Trained Neural Networks: Taking a Relaxing Walk

Notes

Note that if $D = \left\{ x \in {\mathbb {R}}^\eta \,\big |\, Ax \leqslant b\right\} $ is polyhedral, then $z \cdot D = \left\{ x \in {\mathbb {R}}^\eta \,\big | \, Ax \leqslant bz\right\} $.
As is standard in a Benders’ decomposition approach, we can address this by adding a feasibility cut describing the domain of ${\overline{g}}$ (the region where it is finite valued) instead of an optimality cut of the form (10a).
Alternatively, a constructive proof of validity and idealness using Fourier–Motzkin elimination is given in the extended abstract of this work [4, Proposition 1].
In this context, logits are non-normalized predictions of the neural network so that ${\tilde{x}} \in [0,1]^{28 \times 28}$ is predicted to be digit $i-1$ with probability $\exp (f({\tilde{x}})_i)/\sum _{j=1}^{10}\exp (f({\tilde{x}})_j)$ [1].
We use cut callbacks in Gurobi to inject separated inequalities into the cut loop. While this offers little control over when the separation procedure is run, it allows us to take advantage of Gurobi’s sophisticated cut management implementation.

References

https://developers.google.com/machine-learning/glossary/#logits. Accessed 6 Feb 2020
Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015)
Google Scholar
Amos, B., Xu, L., Kolter, J.Z.: Input convex neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 146–155. PMLR, International Convention Centre, Sydney (2017)
Anderson, R., Huchette, J., Tjandraatmadja, C., Vielma, J.P.: Strong mixed-integer programming formulations for trained neural networks. In: A. Lodi, V. Nagarajan (eds.) Proceedings of the 20th Conference on Integer Programming and Combinatorial Optimization, pp. 27–42. Springer International Publishing, Cham (2019). arxiv:1811.08359
Arora, R., Basu, A., Mianjy, P., Mukherjee, A.: Understanding deep neural networks with rectified linear units (2016). arXiv preprint arXiv:1611.01491
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)
Google Scholar
Atamtürk, A., Gómez, A.: Strong formulations for quadratic optimization with M-matrices and indicator variables. Math. Program. 170, 141–176 (2018)
MathSciNet MATH Google Scholar
Balas, E.: Disjunctive programming and a hierarchy of relaxations for discrete optimization problems. SIAM J. Algorithmic Discrete Methods 6(3), 466–486 (1985)
MathSciNet MATH Google Scholar
Balas, E.: Disjunctive programming: properties of the convex hull of feasible points. Discrete Appl. Math. 89, 3–44 (1998)
MathSciNet MATH Google Scholar
Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Neuron constraints to model complex real-world problems. In: International Conference on the Principles and Practice of Constraint Programming, pp. 115–129. Springer, Berlin (2011)
Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Optimization and controlled systems: a case study on thermal aware workload dispatching. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 427–433 (2012)
Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A.V., Criminisi, A.: Measuring neural net robustness with constraints. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2016)
Belotti, P., Bonami, P., Fischetti, M., Lodi, A., Monaci, M., Nogales-Gomez, A., Salvagnin, D.: On handling indicator constraints in mixed integer programming. Comput. Optim. Appl. 65(3), 545–566 (2016)
MathSciNet MATH Google Scholar
Bertsimas, D., Tsitsiklis, J.: Introduction to Linear Optimization. Athena Scientific, Belmont, MA (1997)
Google Scholar
Bienstock, D., Muñoz, G., Pokutta, S.: Principled deep neural network training through linear programming (2018). arXiv preprint arXiv:1810.03218
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)
MATH Google Scholar
Bonami, P., Lodi, A., Tramontani, A., Wiese, S.: On mathematical programming with indicator constraints. Math. Program. 151(1), 191–223 (2015)
MathSciNet MATH Google Scholar
Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2559–2566 (2010)
Bunel, R., Turkaslan, I., Torr, P.H., Kohli, P., Kumar, M.P.: A unified view of piecewise linear neural network verification. In: Advances in Neural Information Processing Systems (2018)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57 (2017)
Chen, L., Ma, W., Natarajan, K., Simchi-Levi, D., Yan, Z.: Distributionally robust linear and discrete optimization with marginals. Available at SSRN 3159473 (2018)
Cheng, C.H., Nührenberg, G., Ruess, N.: Maximum resilience of artifical neural networks. In: International Symposium on Automated Technology for Verification and Analysis. Springer, Cham (2017)
Dulac-Arnold, G., Evans, R., van Hasselt, H., Sunehag, P., Lillicrap, T., Hunt, J., Mann, T., Weber, T., Degris, T., Coppin, B.: Deep reinforcement learning in large discrete action spaces (2015). arxiv:1512.07679
Dutta, S., Jha, S., Sanakaranarayanan, S., Tiwari, A.: Output range analysis for deep feedforward neural networks. In: NASA Formal Methods Symposium (2018)
Dvijotham, K., Gowal, S., Stanforth, R., Arandjelovic, R., O’Donoghue, B., Uesato, J., Kohli, P.: Training verified learners with learned verifiers (2018). arxiv:1805.10265
Dvijotham, K., Stanforth, R., Gowal, S., Mann, T., Kohli, P.: A dual approach to scalable verification of deep networks. In: Thirty-Fourth Conference Annual Conference on Uncertainty in Artificial Intelligence (2018)
Ehlers, R.: Formal verification of piece-wise linear feed-forward neural networks. In: International Symposium on Automated Technology for Verification and Analysis. Springer, Cham (2017)
Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., Madry, A.: Exploring the landscape of spatial robustness. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, pp. 1802–1811. PMLR, Long Beach, CA (2019). http://proceedings.mlr.press/v97/engstrom19a.html. Accessed 6 Feb 2020
Fischetti, M., Jo, J.: Deep neural networks and mixed integer linear optimization. Constraints 23, 296–309 (2018)
MathSciNet MATH Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style (2015). arxiv:1508.06576
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
MATH Google Scholar
Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the 30th International Conference on Machine Learning, vol. 28, pp. 1319–1327 (2013)
Grimstad, B., Andersson, H.: ReLU networks as surrogate models in mixed-integer linear programs. Comput. Chem. Eng. 131, 106580 (2019)
Google Scholar
Haneveld, W.K.K.: Robustness against dependence in pert: an application of duality and distributions with known marginals. In: Stochastic Programming 84 Part I, pp. 153–182. Springer (1986)
Hanin, B.: Universal function approximation by deep neural nets with bounded width and ReLU activations (2017). arXiv preprint arXiv:1708.02691
Hijazi, H., Bonami, P., Cornuéjols, G., Ouorou, A.: Mixed-integer nonlinear programs featuring “on/off” constraints. Comput. Optim. Appl. 52(2), 537–558 (2012)
MathSciNet MATH Google Scholar
Hijazi, H., Bonami, P., Ouorou, A.: A note on linear on/off constraints (2014). http://www.optimization-online.org/DB_FILE/2014/04/4309.pdf. Accessed 6 Feb 2020
Huber, B., Rambau, J., Santos, F.: The Cayley Trick, lifting subdivisions and the Bohne-Dress theorem of zonotopal tiltings. J. Eur. Math. Soc. 2(2), 179–198 (2000)
MathSciNet MATH Google Scholar
Huchette, J.: Advanced mixed-integer programming formulations: methodology, computation, and application. Ph.D. thesis, Massachusetts Institute of Technology (2018)
Jeroslow, R., Lowe, J.: Modelling with integer variables. Math. Program. Study 22, 167–184 (1984)
MathSciNet MATH Google Scholar
Jeroslow, R.G.: Alternative formulations of mixed integer programs. Ann. Oper. Res. 12, 241–276 (1988)
MathSciNet Google Scholar
Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: International Conference on Computer Aided Verification, pp. 97–117 (2017)
Khalil, E.B., Gupta, A., Dilkina, B.: Combinatorial attacks on binarized neural networks. In: International Conference on Learning Representations (2019)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014). arxiv:1412.6980
Korte, B., Vygen, J.: Combinatorial Optimization: Theory and Algorithms. Springer, Berlin (2000)
MATH Google Scholar
Kumar, A., Serra, T., Ramalingam, S.: Equivalent and approximate transformations of deep neural networks (2019). arxiv:1905.11428
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Google Scholar
Liu, C., Arnon, T., Lazarus, C., Barrett, C., Kochenderfer, M.J.: Algorithms for verifying deep neural networks (2019). arxiv:1903.06758
Lombardi, M., Gualandi, S.: A lagrangian propagator for artificial neural networks in constraint programming. Constraints 21(4), 435–462 (2016)
MathSciNet MATH Google Scholar
Lombardi, M., Milano, M.: Boosting combinatorial problem modeling with machine learning. In: Proceedings IJCAI, pp. 5472–5478 (2018)
Lombardi, M., Milano, M., Bartolini, A.: Empirical decision model learning. Artif. Intell. 244, 343–367 (2017)
MathSciNet MATH Google Scholar
Lomuscio, A., Maganti, L.: An approach to reachability analysis for feed-forward ReLU neural networks (2017). arxiv:1706.07351
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language (2013)
Mladenov, M., Boutilier, C., Schuurmans, D., Elidan, G., Meshi, O., Lu, T.: Approximate linear programming for logistic Markov decision processes. In: Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17), pp. 2486–2493. Melbourne, Australia (2017)
Mordvintsev, A., Olah, C., Tyka, M.: Inceptionism: Going deeper into neural networks (2015). https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html. Accessed 6 Feb 2020
Natarajan, K., Song, M., Teo, C.P.: Persistency model and its applications in choice modeling. Manage. Sci. 55(3), 453–469 (2009)
MATH Google Scholar
Olah, C., Mordvintsev, A., Schubert, L.: Feature visualization. Distill (2017). https://distill.pub/2017/feature-visualization. Accessed 6 Feb 2020
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy, pp. 372–387 (2016)
Raghunathan, A., Steinhardt, J., Liang, P.: Semidefinite relaxations for certifying robustness to adversarial examples. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 10,900–10,910. Curran Associates Inc. (2018)
Ryu, M., Chow, Y., Anderson, R., Tjandraatmadja, C., Boutilier, C.: CAQL: Continuous action Q-learning (2019). arxiv:1909.12397
Salman, H., Yang, G., Zhang, H., Hsieh, C.J., Zhang, P.: A convex relaxation barrier to tight robustness verification of neural networks (2019). arxiv:1902.08722
Say, B., Wu, G., Zhou, Y.Q., Sanner, S.: Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 750–756 (2017)
Schweidtmann, A.M., Mitsos, A.: Global deterministic optimization with artificial neural networks embedded. J. Optim. Theory Appl. 180, 925–948 (2019)
MathSciNet MATH Google Scholar
Serra, T., Ramalingam, S.: Empirical bounds on linear regions of deep rectifier networks (2018). arxiv:1810.03370
Serra, T., Tjandraatmadja, C., Ramalingam, S.: Bounding and counting linear regions of deep neural networks. In: Thirty-Fifth International Conference on Machine Learning (2018)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014)
Tawarmalani, M., Sahinidis, N.: Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming: Theory, Algorithms, Software and Applications, vol. 65. Springer, Berlin (2002)
MATH Google Scholar
Tjeng, V., Xiao, K., Tedrake, R.: Verifying neural networks with mixed integer programming. In: International Conference on Learning Representations (2019)
Trespalacios, F., Grossmann, I.E.: Improved big-M reformulation for generalized disjunctive programs. Comput. Chem. Eng. 76, 98–103 (2015)
Google Scholar
Vielma, J.P.: Mixed integer linear programming formulation techniques. SIAM Rev. 57(1), 3–57 (2015)
MathSciNet MATH Google Scholar
Vielma, J.P.: Embedding formulations and complexity for unions of polyhedra. Manage. Sci. 64(10), 4471–4965 (2018)
Google Scholar
Vielma, J.P.: Small and strong formulations for unions of convex sets from the Cayley embedding. Math. Program. 177, 21–53 (2018)
MathSciNet MATH Google Scholar
Vielma, J.P., Nemhauser, G.: Modeling disjunctive constraints with a logarithmic number of binary variables and constraints. Math. Program. 128(1–2), 49–72 (2011)
MathSciNet MATH Google Scholar
Weibel, C.: Minkowski sums of polytopes: combinatorics and computation. Ph.D. thesis, École Polytechnique Fédérale de Lausanne (2007)
Weiss, G.: Stochastic bounds on distributions of optimal value functions with applications to pert, network flows and reliability. Oper. Res. 34(4), 595–605 (1986)
MathSciNet MATH Google Scholar
Wong, E., Kolter, J.Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: International Conference on Machine Learning (2018)
Wong, E., Schmidt, F., Metzen, J.H., Kolter, J.Z.: Scaling provable adversarial defenses. In: 32nd Conference on Neural Information Processing Systems (2018)
Wu, G., Say, B., Sanner, S.: Scalable planning with Tensorflow for hybrid nonlinear domains. In: Advances in Neural Information Processing Systems, pp. 6276–6286 (2017)
Xiao, K.Y., Tjeng, V., Shafiullah, N.M., Madry, A.: Training for faster adversarial robustness verification via inducing ReLU stability. In: International Conference on Learning Representations (2019)
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolution network (2015). arxiv:1505.00853
Zeng, H., Edwards, M.D., Liu, G., Gifford, D.K.: Convolutional neural network architectures for predicting DNA-protein binding. Bioinformatics 32(12), 121–127 (2016)
Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge Yeesian Ng and Ondřej Sýkora for many discussions on the topic of this paper, and for their work on the development of the tf.opt package used in the computational experiments.

Author information

Authors and Affiliations

Google Inc, Cambridge, MA, USA
Ross Anderson, Christian Tjandraatmadja & Juan Pablo Vielma
Rice University, Houston, TX, USA
Joey Huchette
Columbia University, New York, NY, USA
Will Ma
Massachusetts Institute of Technology, Cambridge, MA, USA
Juan Pablo Vielma

Authors

Ross Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Joey Huchette
View author publications
You can also search for this author in PubMed Google Scholar
Will Ma
View author publications
You can also search for this author in PubMed Google Scholar
Christian Tjandraatmadja
View author publications
You can also search for this author in PubMed Google Scholar
Juan Pablo Vielma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joey Huchette.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

An extended abstract version of this paper appeared in [4].

A tight big-M formulation for Max on polyhedral domains

We present a tightened big-M formulation for the maximum of d affine functions over an arbitrary polytope input domain. We can view the formulation as a relaxation of the system in Proposition 4, where we select d inequalities from each of (10a) and (10b): those corresponding to ${\overline{\alpha }}, \underline{\alpha } \in \{ w^1, \ldots , w^d \}$. This subset yields a valid formulation, and we obviate the need for direct separation. This formulation can also be viewed as an application of Proposition 6.2 of Vielma [72], and is similar to the big-M formulations for generalized disjunctive programs of Trespalacios and Grossmann [71].

Proposition 15

Take coefficients N such that, for each $\ell , k \in [\![d ]\!]$ with $\ell \ne k$,

$$\begin{aligned} N^{\ell ,k,+}&\geqslant \max _{x^k \in D_{|k}}\{(w^k - w^\ell ) \cdot x^k\} \end{aligned}$$

(32a)

$$\begin{aligned} N^{\ell ,k,-}&\leqslant \min _{x^k \in D_{|k}}\{(w^k - w^\ell ) \cdot x^k\} , \end{aligned}$$

(32b)

and $N^{k,k,+} = N^{k,k,-} = 0$ for all $k \in [\![d ]\!]$. Then a valid formulation for ${\text {gr}}(\texttt {Max}{} \circ (f^1,\ldots ,f^d); D)$ is:

$$\begin{aligned}&y \leqslant w^\ell \cdot x + \sum _{k=1}^d (N^{\ell ,k,+} + b^k) z_k\quad \forall \ell \in [\![d ]\!] \end{aligned}$$

(33a)

$$\begin{aligned}&y \geqslant w^\ell \cdot x + \sum _{k=1}^d (N^{\ell ,k,-} + b^k) z_k\quad \forall \ell \in [\![d ]\!] \end{aligned}$$

(33b)

$$\begin{aligned}&(x,y,z) \in D \times {\mathbb {R}}\times \varDelta ^d \end{aligned}$$

(33c)

$$\begin{aligned}&z \in \{0,1\}^d \end{aligned}$$

(33d)

The tightest possible coefficients in (32) can be computed exactly by solving an LP for each pair of input affine functions $\ell \ne k$. While this might be exceedingly computationally expensive if d is large, it is potentially viable if d is a small fixed constant. For example, the max pooling neuron computes the maximum over a rectangular window in a larger array [32, Sect. 9.3], and is frequently used in image classification architectures. Typically, max pooling units work with a $2 \times 2$ or a $3 \times 3$ window, in which case $d=4$ or $d=9$, respectively.

In addition, if in practice we observe that if the set $D_{|k}$ is empty, then we can infer that the neuron is not irreducible as the k-th input function is never the maximum, and we can safely prune it. In particular, if we attempt to compute the coefficients for $z_k$ and it is proven infeasible, we can prune the k-th function.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anderson, R., Huchette, J., Ma, W. et al. Strong mixed-integer programming formulations for trained neural networks. Math. Program. 183, 3–39 (2020). https://doi.org/10.1007/s10107-020-01474-5

Download citation

Received: 01 June 2019
Accepted: 23 January 2020
Published: 13 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10107-020-01474-5

Keywords

Mathematics Subject Classification

90C11

Strong mixed-integer programming formulations for trained neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep neural networks and mixed integer linear optimization

Strong Mixed-Integer Programming Formulations for Trained Neural Networks

Optimization over Trained Neural Networks: Taking a Relaxing Walk

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A tight big-M formulation for Max on polyhedral domains

Proposition 15

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Strong mixed-integer programming formulations for trained neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep neural networks and mixed integer linear optimization

Strong Mixed-Integer Programming Formulations for Trained Neural Networks

Optimization over Trained Neural Networks: Taking a Relaxing Walk

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A tight big-M formulation for Max on polyhedral domains

A tight big-M formulation for Max on polyhedral domains

Proposition 15

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation