Abstract
Parametric Markov chains (pMCs) have transitions labeled with functions over a fixed set of parameters. They are useful if the exact transition probabilities are uncertain, e.g., when checking a model for robustness. This paper presents a simple way to check whether the expected total reward until reaching a given target state is monotonic in (some of) the parameters. We exploit this monotonicity together with parameter lifting to find an \(\varepsilon \)-close bound on the optimal expected total reward. Our results are also useful to automatically synthesise controllers with a fixed memory structure for partially observable Markov decision processes (POMDPs), a popular model in AI planning. We experimentally show that our approach can successfully find \(\varepsilon \)-optimal controllers for optimal budget in such POMDPs.
Supported by DFG RTG 2236 “UnRAVeL” and DFG 1462/4-1 “PASYWI”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
ETR = Existential Theory of the Reals. ETR-complete decision problems are as hard as finding the roots of a multivariate polynomial. ETR lies inbetween NP and PSPACE.
- 2.
To be precise, on the interior of the closed set R.
References
Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agents Multi Agent Syst. 21(3), 293–320 (2010)
Baier, C., Hensel, C., Hutschenreiter, L., Junges, S., Katoen, J.P., Klein, J.: Parametric Markov chains: PCTL complexity and fraction-free Gaussian elimination. Inf. Comput. 272 (2020)
Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)
Bonet, B.: Solving large POMDPs using real time dynamic programming. In: Proceedings of the AAAI 2021 Fall Symposium on POMDPs (1998)
Bonet, B., Geffner, H.: Solving POMDPs: RTDP-Bel vs. point-based algorithms. In: IJCAI, pp. 1641–1646 (2009)
Bork, A., Junges, S., Katoen, J.-P., Quatmann, T.: Verification of Indefinite-Horizon POMDPs. In: Hung, D.V., Sokolsky, O. (eds.) ATVA 2020. LNCS, vol. 12302, pp. 288–304. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59152-6_16
Bork, A., Katoen, J.-P., Quatmann, T.: Under-approximating expected total rewards in POMDPs. In: TACAS 2022. LNCS, vol. 13244, pp. 22–40. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99527-0_2
Braziunas, D., Boutilier, C.: Stochastic local search for POMDP controllers. In: AAAI, pp. 690–696. AAAI Press/The MIT Press, Menlo, Park (2004)
Carr, S., Jansen, N., Topcu, U.: Verifiable RNN-based policies for POMDPs under temporal logic constraints. In: IJCAI, pp. 4121–4127. ijcai.org (2020)
Carr, S., Jansen, N., Wimmer, R., Serban, A.C., Becker, B., Topcu, U.: Counterexample-guided strategy improvement for POMDPs using recurrent neural networks. In: IJCAI, pp. 5532–5539. ijcai.org (2019)
Chatterjee, K., Chmelik, M., Davies, J.: A symbolic SAT-based algorithm for almost-sure reachability with small strategies in POMDPs. In: AAAI. pp. 3225–3232. AAAI Press, Menlo, Park (2016)
Chen, T., Feng, Y., Rosenblum, D.S., Su, G.: Perturbation analysis in verification of discrete-time Markov chains. In: Baldan, P., Gorla, D. (eds.) CONCUR 2014. LNCS, vol. 8704, pp. 218–233. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44584-6_16
Cleaveland, M., Ruchkin, I., Sokolsky, O., Lee, I.: Monotonic safety for scalable and data-efficient probabilistic safety analysis. In: ICCPS, pp. 92–103. ACM (2022)
Daws, C.: Symbolic and parametric model checking of discrete-time Markov chains. In: Liu, Z., Araki, K. (eds.) ICTAC 2004. LNCS, vol. 3407, pp. 280–294. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31862-0_21
Doshi, F., Pineau, J., Roy, N.: Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs. In: ICML. ACM International Conference Proceeding Series, vol. 307, pp. 256–263. ACM (2008)
Fang, X., Calinescu, R., Gerasimou, S., Alhwikem, F.: Fast parametric model checking through model fragmentation. In: ICSE. pp. 835–846. IEEE (2021)
Filieri, A., Ghezzi, C., Tamburrelli, G.: Run-time efficient probabilistic model checking. In: ICSE. ACM (2011)
Gainer, P., Hahn, E.M., Schewe, S.: Accelerated model checking of parametric Markov chains. In: Lahiri, S.K., Wang, C. (eds.) ATVA 2018. LNCS, vol. 11138, pp. 300–316. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01090-4_18
Gouberman, A., Siegle, M., Tati, B.: Markov chains with perturbed rates to absorption: theory and application to model repair. Perform. Eval. 130, 32–50 (2019)
Hahn, E.M., Han, T., Zhang, L.: Synthesis for PCTL in parametric Markov decision processes. In: Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617, pp. 146–161. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398-5_12
Hahn, E.M., Hermanns, H., Zhang, L.: Probabilistic reachability for parametric Markov models. Int. J. Softw. Tools Technol. Transfer 13, 319 (2011). https://doi.org/10.1007/s10009-010-0146-x
Hartmanns, A., Klauck, M., Parker, D., Quatmann, T., Ruijters, E.: The quantitative verification benchmark set. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11427, pp. 344–350. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17462-0_20
Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. J. Artif. Intell. Res. 13, 33–94 (2000)
Hensel, C., Junges, S., Katoen, J.-P., Quatmann, T., Volk, M.: The probabilistic model checker Storm. Int. J. Softw. Tools Technol. Transfer 1–22 (2021). https://doi.org/10.1007/s10009-021-00633-z
Hutschenreiter, L., Baier, C., Klein, J.: Parametric Markov chains: PCTL complexity and fraction-free Gaussian elimination. In: GandALF. EPTCS, vol. 256 (2017)
Jansen, N., Corzilius, F., Volk, M., Wimmer, R., Ábrahám, E., Katoen, J.-P., Becker, B.: Accelerating parametric probabilistic verification. In: Norman, G., Sanders, W. (eds.) QEST 2014. LNCS, vol. 8657, pp. 404–420. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10696-0_31
Junges, S.: Parameter synthesis in Markov models. Ph.D. thesis, RWTH Aachen University, Germany (2020)
Junges, S., et al.: Parameter synthesis for Markov models. CoRR abs/1903.07993 (2019)
Junges, S., et al.: Finite-state controllers of POMDPs using parameter synthesis. In: UAI. AUAI Press, Monterey,(2018)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
Lanotte, R., Maggiolo-Schettini, A., Troina, A.: Parametric probabilistic transition systems for system design and analysis. Formal Aspects Comput. 19(1), 93–109 (2007)
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning policies for partially observable environments: Scaling up. In: ICML, pp. 362–370. Morgan Kaufmann (1995)
Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov decision processes. Oper. Res. 39(1), 162–175, 104504 (1991)
Pathak, S., Ábrahám, E., Jansen, N., Tacchella, A., Katoen, J.P.: A greedy approach for the efficient repair of stochastic models. In: NFM. LNCS, vol. 9058 (2015)
Quatmann, T., Dehnert, C., Jansen, N., Junges, S., Katoen, J.-P.: Parameter synthesis for Markov models: faster than ever. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 50–67. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_4
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 4th edn. Pearson, Boston (2020)
Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. In: NIPS, pp. 2164–2172. Curran Associates, Inc., Red Hook (2010)
Smith, T., Simmons, R.G.: Heuristic Search Value Iteration for POMDPs. In: UAI. pp. 520–527. AUAI Press (2004)
Spel, J., Junges, S., Katoen, J.-P.: Are parametric Markov chains monotonic? In: Chen, Y.-F., Cheng, C.-H., Esparza, J. (eds.) ATVA 2019. LNCS, vol. 11781, pp. 479–496. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31784-3_28
Spel, J., Junges, S., Katoen, J.P.: Are parametric Markov chains monotonic? CoRR abs/1907.08491 (2019), extended version
Spel, J., Junges, S., Katoen, J.-P.: Finding provably optimal Markov chains. In: TACAS 2021. LNCS, vol. 12651, pp. 173–190. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72016-2_10
Spel, J., Stein, S., Katoen, J.P.: POMDP controllers with optimal budget (artifact). Zenodo , 104504 (2022). https://doi.org/10.5281/zenodo.6793377
Wang, Y., Chaudhuri, S., Kavraki, L.E.: Bounded policy synthesis for POMDPs with safe-reachability objectives. In: International Foundation for Autonomous Agents and Multiagent Systems Richland, AAMAS, pp. 238–246. SC, USA/ACM (2018)
Winkler, T., Junges, S., Pérez, G.A., Katoen, J.: On the complexity of reachability in parametric Markov decision processes. In: Proceedings of CONCUR. LIPIcs, vol. 140, pp. 14:1–14:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
Winterer, L., et al.: Motion planning under partial observability using game-based abstraction. In: CDC, pp. 2201–2208. IEEE (2017)
Yang, L., Murugesan, S., Zhang, J.: Real-time scheduling over Markovian channels: when partial observability meets hard deadlines. In: GLOBECOM, pp. 1–5. IEEE (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Algorithm
In this section, we describe the algorithmic approach to obtain a sufficient reward order, which is used for monotonicity checking.
Our algorithm takes as input an pMC and bounds for all states. It returns a set of annotated reward orders \((\mathcal {A}, \preceq ^{{\texttt {rew}}, \mathcal {A}})\), where \(\mathcal {A}\) is a set of assumptions of the form \(s \preceq ^{{\texttt {rew}}}_{}s'\). The algorithm iteratively computes a set of reward orders. At this stage, both the bounds and the assumptions are not relevant and not used.
Initially, we start with the trivial order for all \(s\in S\). The queue is initialised (l.1) with an empty set of assumptions, the trivial order and all non-target states in S. At each iteration, we pick an order from the queue. If all states are processed (l.5), we are done building the reward order. Otherwise, we pick a state s to process (l.7), and try to order this state based on the reasoning from Lemmas 2 and 3 in Algorithm 1. We pick this state in reversed topological order, as this increases the likelihood that all successor states are contained in the order. Once we processed s, we put the (possibl) extended order in the queue together with the set of assumptions and the states we still need to process (l.25). Note that, if the state reward is 0 (l.9–18), the algorithm is equivalent to Algorithm 1 extended for treating cycles as found in [39, Sect. 4.1 and 4.3]. As assumptions are not used, Algorithm 1 in fact computes a single reward order; it runs linear in the number of transitions. Note that this order is not necessarily sufficient for the pMC.
Assumptions. To obtain a sufficient reward order, we consider assumptions, as described in Sect. 3.3. We exploit the annotations (called assumptions) that were ignored so far. Recall from Definition 6 that a reward order is not sufficient at a parametric state s, if its successors \(s_1\) and \(s_2\) are not totally ordered. To remain sound, we consider all possible orderings of \(s_1\) and \(s_2\). We add for each possible ordering a copy of the reward order to the queue, and continue as if the ordering of \(s_1\) and \(s_2\) is known. We extend Algorithm 1 with assumptions by adding the code of Algorithm 2 after lines 18 and 24.
Using Bounds. As creating assumptions might lead to an exponential explosion in the number of orders in the queue, we also consider the situation in which we have bounds \(L_R(s)\) and \(U_R(s)\) at our disposal satisfying
As these bounds are a relatively cheap way to order states, we extend Algorithm 1 by adding Algorithm 3 directly after line 7. This algorithm uses Lemma 6 to order the successor states of s to obtain a sufficient order for s (l.1–5). Furthermore, it uses Lemmas 7 and 8, to order a state relative to its successor states (l. 10–15).
B Results
Table 3 shows the \(\varepsilon \)-optimal values for the integrated approach. It shows the results for \(\varepsilon =0.1\) and \(\varepsilon = 0.05\). For all entries the result for \(\varepsilon =0.1\) and \(\varepsilon = 0.05\) with heuristics is equal. This confirms that our initial guess for CurMax is a good one, even though we do need splitting of the region. The splitting is necessary as the bound found with parameter lifting is not yet tight enough to confirm that CurMax is indeed an \(\varepsilon \)-optimum. When maximizing the 4\(\,\times \,\)4 grid (2) we find with heuristics a value of 10685 and without heuristics 10679. This can be explained by the new way of updating CurMax.
C Proofs
1.1 C.1 Proof Sketch of Theorem 1
Theorem 1
Proof
We sketch the proof of Theorem 1, which follows the lines of [39, Thm. 2]. First of all, we lift the notion of local monotonicity (Definition 4) to local monotonicity for n steps. Secondly, we claim that local monotonicity implies local monotonicity for n steps and from this global monotonicity follows.
We define the length of a finite path \(|\hat{\pi }| = n\) for \(\hat{\pi }= s_0 s_1 \ldots s_n\). Let \({ Paths}^{n}(s)\) be the set of all paths with length n starting from state \(s\in S\).
Definition 8
(Locally monotonic increasing for n steps).\(\textsf{ER}_{\mathcal {M}}^{s \rightarrow T}\) is locally monotonic increasing for n steps in parameter p (at s) on region R, denoted \(\textsf{ER}_{\mathcal {M}}^{s \rightarrow T}{\uparrow _{p}^{\ell , {n}, R}}\), iff for all \(\vec {u} \in R\):
For \(n=1\), Definition 8 corresponds to Definition 4. The claim that local monotonicity implies local monotonicity for n steps (for all n) can be proven by induction over n. The claim that global monotonicity follows from this can be shown similar to the proof of [39, Theorem 2, Equation 2], with as main difference that no state \(s \in S\) exists for which \(\textsf{Pr}^{s \rightarrow T} = 0\), cf. Remark 1.
1.2 C.2 Proof of Lemma 7
Lemma 7
For any state s with \(\textsf {succ}(s) = \{ s_1, s_2 \}\), \(f = \mathcal {P}(s, s_1)\), region R and \(s_1 \preceq ^{{\texttt {rew}}}_{}s_2\): if for all \(\vec {u} \in R\)
-
1.
\(r(s) \ge f(\vec {u}) \cdot \left( U_R(s_2) - L_R(s_1)\right) \ \text { then } \ s_2 \preceq ^{{\texttt {rew}}}_{}s \text {, and }\)
-
2.
\( r(s) \le f(\vec {u}) \cdot \left( L_R(s_2) - U_R(s_1)\right) \ \text { then } \ s \preceq ^{{\texttt {rew}}}_{}s_2.\)
Proof
Let \(f' = \mathcal {P}(s, s_2) = 1{-}f\). For the first case, we derive for all \(\vec {u} \in R\):
From this. it immediately follows for all \(\vec {u} \in R\):
so \(s_2 \preceq ^{{\texttt {rew}}}_{}s\).
For the second case, the proof follows in a similar way.
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Spel, J., Stein, S., Katoen, JP. (2022). POMDP Controllers with Optimal Budget. In: Ábrahám, E., Paolieri, M. (eds) Quantitative Evaluation of Systems. QEST 2022. Lecture Notes in Computer Science, vol 13479. Springer, Cham. https://doi.org/10.1007/978-3-031-16336-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-16336-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16335-7
Online ISBN: 978-3-031-16336-4
eBook Packages: Computer ScienceComputer Science (R0)