research-article

ORSuite: Benchmarking Suite for Sequential Operations Models

Authors:

Christopher Archer,

Siddhartha Banerjee,

Mayleen Cortez,

Sean R. Sinclair,

Christina Lee YuAuthors Info & Claims

ACM SIGMETRICS Performance Evaluation Review, Volume 49, Issue 2

Pages 57 - 61

https://doi.org/10.1145/3512798.3512819

Published: 20 January 2022 Publication History

Abstract

Reinforcement learning (RL) has received widespread attention across multiple communities, but the experiments have focused primarily on large-scale game playing and robotics tasks. In this paper we introduce ORSuite, an open-source library containing environments, algorithms, and instrumentation for operational problems. Our package is designed to motivate researchers in the reinforcement learning community to develop and evaluate algorithms on operational tasks, and to consider the true multi-objective nature of these problems by considering metrics beyond cumulative reward.

References

[1]

Daron Acemoglu, Victor Chernozhukov, Iv´an Werning, and Michael D Whinston. Optimal targeted lockdowns in a multi-group sir model. Working Paper 27102, National Bureau of Economic Research, May 2020.

[2]

Alekh Agarwal, Nan Jiang, Sham M Kakade, and Wen Sun. Reinforcement learning: Theory and algorithms. 2020.

[3]

Siddhartha Banerjee, Daniel Freund, and Thodoris Lykouris. Pricing and optimization in shared vehicle systems: An approximation framework. Operations Research, 2021.

[4]

Siddhartha Banerjee, Yash Kanoria, and Pengyu Qian. Dynamic assignment control of a closed queueing network under complete resource pooling. arXiv e-prints, pages arXiv--1803, 2018.

[5]

Allan Borodin, Nathan Linial, and Michael E Saks. An optimal on-line algorithm for metrical task system. Journal of the ACM (JACM), 39(4):745--763, 1992.

[6]

Anton Braverman, Jim G Dai, Xin Liu, and Lei Ying. Empty-car routing in ridesharing systems. Operations Research, 2019.

[7]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.

[8]

Luce Brotcorne, Gilbert Laporte, and Frederic Semet. Ambulance location and relocation models. European journal of operational research, 147(3):451--463, 2003.

[9]

Food Bank of the Southern Tier of New York. https://www.foodbankst.org/, 2020.

[10]

Ashley Hill, Antonin Raffin, Maximilian Ernestus, Adam Gleave, Anssi Kanervisto, Rene Traore, Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, and Yuhuai Wu. Stable baselines. https://github.com/hill-a/stable-baselines, 2018.

[11]

Christian D Hubbs, Hector D Perez, Owais Sarwar, Nikolaos V Sahinidis, Ignacio E Grossmann, and John M Wassick. Or-gym: A reinforcement learning library for operations research problem. arXiv preprint arXiv:2008.06319, 2020.

[12]

William Ogilvy Kermack, A. G. McKendrick, and Gilbert Thomas Walker. A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London, 115:772, 1927. Series A, Containing Papers of a Mathematical and Physical Character.

[13]

Anup Malani, Satej Soman, Sam Asher, Paul Novosad, Clement Imbert, Vaidehi Tandel, Anish Agarwal, Abdullah Alomar, Arnab Sarker, Devavrat Shah, Dennis Shen, Jonathan Gruber, Stuti Sachdeva, David Kaiser, and Luis M.A. Bettencourt. Adaptive control of covid-19 outbreaks in india: Local, gradual, and trigger-based exit paths from lockdown. Working Paper 27532, National Bureau of Economic Research, July 2020.

[14]

Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, ravichandra addanki, Mehrdad Khani Shirkoohi, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, and Dr.Mohammad Alizadeh. Park: An open platform for learning-augmented computer systems. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.

[15]

Matthew S Maxwell, Mateo Restrepo, Shane G Henderson, and Huseyin Topaloglu. Approximate dynamic programming for ambulance redeployment. INFORMS Journal on Computing, 22(2):266--281, 2010.

Digital Library

[16]

New York State Department of Health. New york state's covid-19 vaccination program. October 2020.

[17]

W Powell. Reinforcement learning and stochastic optimization, 2019.

[18]

Antoine Prouvost, Justin Dumouchelle, Maxime Gasse, Didier Ch´etelat, and Andrea Lodi. Ecole: A library for learning inside milp solvers. arXiv preprint arXiv:2104.02828, 2021.

[19]

Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1st edition, 1994.

[20]

Samuel Ridler. Jemss. https://github.com/uoa-ems-research/JEMSS.jl, 2021.

[21]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.

[22]

Sean Sinclair, Christopher Archer, Carrie Rucker, Max Solberg, Mayleen Cortez, Shashank Pathak, Siddhartha Banerjee, and Christina Yu. Orsuite. https://github.com/cornell-orie/ORSuite, 2021.

[23]

Sean Sinclair, Tianyu Wang, Gauri Jain, Siddhartha Banerjee, and Christina Yu. Adaptive discretization for model-based reinforcement learning. Advances in Neural Information Processing Systems, 33, 2020.

[24]

Sean R. Sinclair, Siddhartha Banerjee, and Christina Lee Yu. Adaptive discretization for episodic reinforcement learning in metric spaces. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(3):1--44, Dec 2019.

Digital Library

[25]

Sean R. Sinclair, Siddhartha Banerjee, and Christina Lee Yu. Sequential fair allocation: Achieving the optimal envy-efficiency tradeoff curve, 2021.

[26]

Sean R Sinclair, Gauri Jain, Siddhartha Banerjee, and Christina Lee Yu. Sequential fair allocation of limited resources under stochastic demands. arXiv preprint 60 Performance Evaluation Review, Vol. 49, No. 2, September 2021 arXiv:2011.14382, 2020.

[27]

Aleksandrs Slivkins. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1--2):1--286, 2019.

[28]

Zhao Song and Wen Sun. Efficient model-free reinforcement learning in metric spaces. arXiv preprint arXiv:1905.00475, 2019.

[29]

Robert Sugden. Is fairness good? a critique of varian's theory of fairness. Nous, pages 505--511, 1984.

[30]

Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.

Digital Library

[31]

Hal R. Varian. Equity, envy, and efficiency. Journal of Economic Theory, 9(1):63--91, September 1974.

[32]

Hal R Varian. Two problems in the theory of fairness. Journal of Public Economics, 5(3--4):249--260, 1976.

Cited By

Sinclair S(2024)Adaptivity, Structure, and Objectives in Sequential Decision-MakingACM SIGMETRICS Performance Evaluation Review10.1145/3639830.363984651:3(38-41)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3639830.3639846

Recommendations

Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Lifelong Machine Learning
Proposal and evaluation of deep exploitation-oriented learning under multiple reward environment
Abstract
Recently, deep reinforcement learning (DRL) has attracted considerable attention. The well-known deep Q-network (DQN) architecture successfully combines deep learning and Q-learning which is a representative reinforcement learning (RL) ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review

ACM SIGMETRICS Performance Evaluation Review Volume 49, Issue 2

September 2021

73 pages

ISSN:0163-5999

DOI:10.1145/3512798

Editor:
Zhenhua Liu
Stony Brook University

Issue’s Table of Contents

Copyright © 2022 Copyright is held by the owner/author(s).

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2022

Published in SIGMETRICS Volume 49, Issue 2

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
102
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)4

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sinclair S(2024)Adaptivity, Structure, and Objectives in Sequential Decision-MakingACM SIGMETRICS Performance Evaluation Review10.1145/3639830.363984651:3(38-41)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3639830.3639846

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents