research-article

Distribution-based objectives for Markov Decision Processes

Authors:

Nikhil VyasAuthors Info & Claims

LICS '18: Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science

Pages 36 - 45

https://doi.org/10.1145/3209108.3209185

Published: 09 July 2018 Publication History

Abstract

We consider distribution-based objectives for Markov Decision Processes (MDP). This class of objectives gives rise to an interesting trade-off between full and partial information. As in full observation, the strategy in the MDP can depend on the state of the system, but similar to partial information, the strategy needs to account for all the states at the same time.

In this paper, we focus on two safety problems that arise naturally in this context, namely, existential and universal safety. Given an MDP A and a closed and convex polytope H of probability distributions over the states of A, the existential safety problem asks whether there exists some distribution Δ in H and a strategy of A, such that starting from Δ and repeatedly applying this strategy keeps the distribution forever in H. The universal safety problem asks whether for all distributions in H, there exists such a strategy of A which keeps the distribution forever in H. We prove that both problems are decidable, with tight complexity bounds: we show that existential safety is PTIME-complete, while universal safety is co-NP-complete.

Further, we compare these results with existential and universal safety problems for Rabin's probabilistic finite-state automata (PFA), the subclass of Partially Observable MDPs which have zero observation. Compared to MDPs, strategies of PFAs are not state-dependent. In sharp contrast to the PTIME result, we show that existential safety for PFAs is undecidable, with H having closed and open boundaries. On the other hand, it turns out that the universal safety for PFAs is decidable in EXPTIME, with a co-NP lower bound. Finally, we show that an alternate representation of the input polytope allows us to improve the complexity of universal safety for MDPs and PFAs.

References

[1]

M. Agrawal, S. Akshay, B. Genest, and P. S. Thiagarajan. 2012. Approximate verification of the symbolic dynamics of Markov chains. In LICS'12. IEEE Computer Society, 55--64.

Digital Library

[2]

M. Agrawal, S. Akshay, B. Genest, and P. S. Thiagarajan. 2015. Approximate Verification of the Symbolic Dynamics of Markov Chains. J.ACM 62(1) (2015), 183--235.

Digital Library

[3]

S Akshay, Timos Antonopoulos, Joël Ouaknine, and James Worrell. 2015. Reachability problems for Markov chains. Inform. Process. Lett. 115, 2 (2015), 155--158.

Digital Library

[4]

S. Akshay, Blaise Genest, Bruno Karelovic, and Nikhil Vyas. 2016. On Regularity of unary Probabilistic Automata. In STACS'16. LIPIcs, 8:1--8:14.

[5]

S. Akshay, Blaise Genest, and Nikhil Vyas. 2018. Distribution-based objectives for Markov Decision Processes. (2018). arXiv:1804.09341 https://arxiv.org/abs/1804.09341.

Digital Library

[6]

D. Beauquier, A. Rabinovich, and A. Slissenko. 2002. A Logic of Probability with Decidable Model Checking. In CSL'02. 306--321.

Digital Library

[7]

Alberto Bertoni. 1974. The solution of problems relative to probabilistic automata in the frame of the formal languages theory. In GI Jahrestagung. 107--112.

Digital Library

[8]

Nathalie Bertrand, Miheer Dewaskar, Blaise Genest, and Hugo Gimbert. 2017. Controlling a Population. In Concur'17 (LiPIcs). 12:1--16.

[9]

Vincent D Blondel, Olivier Bournez, Pascal Koiran, Christos H Papadimitriou, and John N Tsitsiklis. 2001. Deciding stability and mortality of piecewise affine dynamical systems. Theoretical Computer Science 255, 1 (2001), 687--696.

Digital Library

[10]

R. Chadha, Dileep Kini, and M. Viswanathan. 2014. Decidable Problems for Unary PFAs. In QEST. LNCS 8657, 329--344.

[11]

R. Chadha, V. Korthikanti, M. Vishwanathan, G. Agha, and Y. Kwon. 2011. Model checking MDPs with a Unique Compact Invariant Set of Distributions. In QEST'11. 121--130.

Digital Library

[12]

K. Chatterjee and M. Tracol. 2012. Decidable Problems for Probabilistic Automata on Infinite Words. In LICS. IEEE Computer Society, 185--194.

Digital Library

[13]

Laurent Doyen, Thierry Massart, and Mahsa Shirmohammadi. 2012. Infinite Synchronizing Words for Probabilistic Automata (Erratum). Technical Report. CoRR abs/1206.0995.

[14]

Laurent Doyen, Thierry Massart, and Mahsa Shirmohammadi. 2014. Limit Synchronization in Markov Decision Processes. In Proceedings of FoSSaCS'14 (Lecture Notes in Computer Science), Vol. 8412. Springer, 58--72.

[15]

P Eirinakis, S Ruggieri, K Subramani, and P Wojciechowski. 2014. On quantified linear implications. Ann. Math. Artif. Intell. 71, 4 (2014), 301--325.

Digital Library

[16]

N. Fijalkow, H. Gimbert, and Y. Ouahladj. 2012. Deciding the Value 1 Problem for Probabilistic Leaktight Automata. In LICS. IEEE Computer Society, 295--304.

Digital Library

[17]

H. Gimbert and Y. Ouahladj. 2010. Probabilistic Automata on Finite Words: Decidable and Undecidable Problems. In ICALP. LNCS 6199, 527--538.

Digital Library

[18]

Raymond Greenlaw, H. James Hoover, and Walter L. Ruzzo. 1995. Limits to Parallel Computation: P-completeness Theory. Oxford University Press, Inc.

Digital Library

[19]

D. Grigoriev and N. Vorobjov. 1988. Solving systems of polynomial inequalities in subexponential time. Journal of symbolic computation 5, 1-2 (1988), 37--64.

Digital Library

[20]

Darald J. Hartfiel. 1998. Markov Set-Chains. Springer.

[21]

Shizuo Kakutani. 1941. A generalization of Brouwer's fixed point theorem. Duke Math. J. 8, 3 (09 1941), 457--459.

[22]

O. Madani, S. Hanks, and A. Condon. 2003. On the undecidability of probabilistic planning and related stochastic optimization problems. Artificial Intelligence 147, 1-2 (2003), 5--34.

Digital Library

[23]

L. Maruthi, I. Tkachev, A. Carta, E. Cinquemani, P. Hersen, G. Batt, and A. Abate. 2014. Towards real-time control of gene expression at the single cell level: a stochastic control approach. In CMSB. LNCS/LNBI, 155--172.

[24]

Joël Ouaknine, Joäo Sousa Pinto, and James Worrell. 2017. On the Polytope Escape Problem for Continuous Linear Dynamical Systems. In HSCC'17. 11--17.

Digital Library

[25]

Joël Ouaknine and James Worrell. 2014. Ultimate Positivity is decidable for simple linear recurrence sequences. In ICALP'14. Springer, 330--341.

[26]

Martin L. Puterman. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming (1st ed.). John Wiley & Sons, Inc.

Digital Library

[27]

Marcus Schaefer. 2009. Complexity of some geometric and topological problems. In International Symposium on Graph Drawing. Springer, 334--344.

Digital Library

[28]

A. Tiwari. 2004. Termination of linear programs. In Computer-Aided Verification, CAV (LNCS), Vol. 3114. Springer, 70--82.

[29]

M. Hirvensalo V Halava, T. Harju and J. Karhumäki. 2005. Skolem's problem - on the border between decidability and undecidability. In TUCS Technical Report Number 683.

[30]

Piotr J. Wojciechowski, Pavlos Eirinakis, and K. Subramani. 2014. Variants of Quantified Linear Programming and Quantified Linear Implication. In ISALM'14.

[31]

Piotr J. Wojciechowski, Pavlos Eirinakis, and K. Subramani. 2017. Erratum to: Analyzing restricted fragments of the theory of linear arithmetic. Ann. Math. Artif. Intell. 79, 4 (2017), 371--392.

Digital Library

Cited By

Gao YJohansson KAbate A(2024)CTL Model Checking of MDPs over Distribution Spaces: Algorithms and Sampling-based ComputationsProceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control10.1145/3641513.3651397(1-12)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3641513.3651397
Gao YAbate AXie LJohansson K(2024)Distributional Reachability for Markov Decision Processes: Theory and ApplicationsIEEE Transactions on Automatic Control10.1109/TAC.2023.334128269:7(4598-4613)Online publication date: Jul-2024
https://doi.org/10.1109/TAC.2023.3341282
Doyen L(2023)Stochastic Games with Synchronization ObjectivesJournal of the ACM10.1145/358886670:3(1-35)Online publication date: 23-May-2023
https://dl.acm.org/doi/10.1145/3588866
Show More Cited By

Recommendations

What is decidable about partially observable Markov decision processes with ω-regular objectives

Decidability of qualitative analysis of parity POMDPs under finite-memory strategies.Optimal memory bounds and complexity (EXPTIME-completeness) for the above problem.Implementation of our algorithm with several heuristics and experimental results. We ...
Reachability in recursive Markov decision processes

We consider a class of infinite-state Markov decision processes generated by stateless pushdown automata. This class corresponds to 112-player games over graphs generated by BPA systems or (equivalently) 1-exit recursive state machines. An extended ...
Markov Decision Processes and Deterministic Büchi Automata

We prove that given a Markov Decision Process (MDP) and a fixed subset of its states~F, there is a Markov policy which maximizes everywhere the probability to reach F infinitely often. Moreover such a maximum policy is computable in polytime in the size ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

LICS '18: Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science

July 2018

960 pages

ISBN:9781450355834

DOI:10.1145/3209108

Copyright © 2018 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGLOG: ACM Special Interest Group on Logic and Computation
EACSL: European Association for Computer Science Logic
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

LICS '18

Sponsor:

SIGLOG
EACSL
IEEE-CS\DATC

LICS '18: 33rd Annual ACM/IEEE Symposium on Logic in Computer Science

July 9 - 12, 2018

Oxford, United Kingdom

Acceptance Rates

Overall Acceptance Rate 215 of 622 submissions, 35%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
140
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gao YJohansson KAbate A(2024)CTL Model Checking of MDPs over Distribution Spaces: Algorithms and Sampling-based ComputationsProceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control10.1145/3641513.3651397(1-12)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3641513.3651397
Gao YAbate AXie LJohansson K(2024)Distributional Reachability for Markov Decision Processes: Theory and ApplicationsIEEE Transactions on Automatic Control10.1109/TAC.2023.334128269:7(4598-4613)Online publication date: Jul-2024
https://doi.org/10.1109/TAC.2023.3341282
Doyen L(2023)Stochastic Games with Synchronization ObjectivesJournal of the ACM10.1145/358886670:3(1-35)Online publication date: 23-May-2023
https://dl.acm.org/doi/10.1145/3588866
Akshay SChatterjee KMeggendorfer TŽikelić Đ(2023)MDPs as Distribution Transformers: Affine Invariant Synthesis for Safety ObjectivesComputer Aided Verification10.1007/978-3-031-37709-9_5(86-112)Online publication date: 17-Jul-2023
https://doi.org/10.1007/978-3-031-37709-9_5
Akshay SChakraborty SPal D(2022)On Eventual Non-negativity and Positivity for the Weighted Sum of Powers of MatricesAutomated Reasoning10.1007/978-3-031-10769-6_39(671-690)Online publication date: 8-Aug-2022
https://dl.acm.org/doi/10.1007/978-3-031-10769-6_39
Doyen Lvan den Bogaard M(2022)Bounds for Synchronizing Markov Decision ProcessesComputer Science – Theory and Applications10.1007/978-3-031-09574-0_9(133-151)Online publication date: 29-Jun-2022
https://dl.acm.org/doi/10.1007/978-3-031-09574-0_9

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents