Continue, quit, restart probability model

Isaac M. Sonin¹ &
Constantine Steinberg¹

269 Accesses
3 Citations
Explore all metrics

Abstract

We discuss a new applied probability model: there is a system whose evolution is described by a Markov chain (MC) with known transition matrix on a discrete state space and at each moment of a discrete time a decision maker can apply one of three possible actions: continue, quit, and restart MC in one of a finite number of fixed “restarting” points. Such a model is a generalization of a model due to Katehakis and Veinott (Math. Oper. Res. 12:262, 1987), where a restart to a unique point was allowed without any fee and quit action was absent. Both models are related to Gittins index and to another index defined in a Whittle family of stopping retirement problems. We propose a transparent recursive finite algorithm to solve our model by performing O(n ³) operations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Restart Strategies in a Continuous Setting

Article Open access 05 May 2021

Waiting times and stopping probabilities for patterns in Markov chains

Article 22 March 2018

Finite Markov Chains and Markov Decision Processes

References

Denardo, E. V., Rothblum, U. G., & Van der Heyden, L. (2004). Index policies for stochastic search in a forest with an application to R&D project management. Mathematics of Operations Research, 29, 162–181. doi:10.1287/moor.1030.0072.
Article Google Scholar
Derman, C. (1970). Finite state Markovian decision processes. Orlando: Academic Press.
Google Scholar
Feinberg, E. A., & Shwartz, A. (2002). Handbook of Markov decision processes. International series in operations research & management science (Vol. 40). Boston: Kluwer Academic.
Google Scholar
Grassmann, W. K., Taksar, M. I., & Heyman, D. P. (1985). Regenerative analysis and steady state distributions for Markov chains. Operations Research, 33, 1107–1116. doi:10.1287/opre.33.5.1107.
Article Google Scholar
Irle, A. (1980). On the best choice problem with random population size. Mathematical Methods of Operations Research, 24, 177–190. doi:10.1007/BF01919245.
Article Google Scholar
Katehakis, M., & Derman, C. (1986). Computing optimal sequential allocation rules in clinical trials, Lecture notes—monograph series (Vol. 8, pp. 29–39).
Google Scholar
Katehakis, M. N., & Veinott, A. F. (1987). The multi-armed bandit problem: decomposition and computation. Mathematics of Operations Research, 12, 262–268. doi:10.1287/moor.12.2.262.
Article Google Scholar
Kemeny, J. G., Snell, J. L., & Knapp, A. W. (1976). Finite Markov chains. Princeton: Springer.
Google Scholar
Mitten, L. G. (1960). An analytic solution to the least cost testing sequence problem. International Journal of Industrial Engineering, 11, 17.
Google Scholar
Puterman, M. (2005). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley.
Google Scholar
Sheskin, T. J. (1985). Technical note—a Markov chain partitioning algorithm for computing steady state probabilities. Operations Research, 33, 228–235. doi:10.1007/s10479-012-1089-2.
Article Google Scholar
Sheskin, T. J. (1999). State reduction in a Markov decision process. International Journal of Mathematical Education in Science and Technology, 30(2), 167–185.
Article Google Scholar
Sonin, I. (2006). The optimal stopping of a Markov chain and recursive solution of Poisson and Bellman equations. In Y. Kabanov, R. Liptser, & J. Stoyanov (Eds.), From stochastic calculus to mathematical finance. The Shiryaev Festschrift (pp. 609–621). Berlin: Springer.
Chapter Google Scholar
Sonin, I. (2008). A generalized Gittins index for a Markov chain and its recursive calculation. Statistics & Probability Letters, 78(12), 1526–1533. doi:10.1016/j.spl.2008.01.049.
Article Google Scholar
Sonin, I. (1999a). The elimination algorithm for the problem of optimal stopping. Mathematical Methods of Operational Research, 49(1), 111–123. doi:10.1007/s001860050016.
Google Scholar
Sonin, I. (1999b). The state reduction and related algorithms and their applications to the study of Markov chains, graph theory, and the optimal stopping problem. Advances in Mathematics, 145, 159–188. doi:10.1006/aima.1998.1813.
Article Google Scholar
Sonin, I. M. (2011). Optimal stopping of Markov chains and three abstract optimization problems. Stochastics, 83(4–6), 405–414. doi:10.1080/17442508.2010.514051.
Google Scholar
Sonin, I., & Presman, E. (1972). The problem of best choice in the case of a random number of objects. Theory of Probability and Its Applications, 17, 695–706.
Google Scholar
Sonin, I., & Thornton, J. (2001). Recursive algorithm for the fundamental/group inverse matrix of a Markov chain from an explicit formula. SIAM Journal on Matrix Analysis and Applications, 23(1), 209–224. doi:10.1137/S0895479899351234.
Article Google Scholar
Stewart, W. (2009). Probability, Markov chains, queues, and simulation: the mathematical basis of performance modeling. Princeton: Princeton University Press.
Google Scholar
Tsitsiklis, J. N. (1994). A short proof of the Gittins index theorem. The Annals of Applied Probability, 4, 194–199. doi:10.1214/aoap/1177005207.
Article Google Scholar
Whittle, P. (1980). Multi-armed bandits and the Gittins index. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 42(2), 143–149.
Google Scholar
Ye, Y. (2011). The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Mathematics of Operations Research, 36(4), 593–603. doi:10.1287/moor.1110.0516.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
Isaac M. Sonin & Constantine Steinberg

Authors

Isaac M. Sonin
View author publications
You can also search for this author in PubMed Google Scholar
Constantine Steinberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Constantine Steinberg.

Appendix

1.1 5.1 Three abstract optimization problems

The common part of all three problems described above in Sect. 3 is a maximization over the set of all partitions of the state set X into three sets: continuation, restart and quit regions. This is a special case of a very general situation of optimization on an abstract index set.

Let us consider the following three abstract optimization problems 1, 2 and 3. Suppose there is an abstract index set U, and let A={a _u} and B={b _u} be two sets of real numbers indexed by the elements of U. Suppose that an assumption U holds,

$$-\infty<a_{u}\leq a<\infty,\qquad 0<b\leq b_{u}\leq1. $$

(U)

In all three problems a DM knows sets U,A and B.

Problem 1: restart problem.:

Find solution(s) of the equation

$$h=\sup_{u\in U}\bigl[a_{u}+(1-b_{u})h\bigr]\equiv H(h). $$

(41)

Equation (41) is a Bellman (optimality) equation for the “value of the game”, i.e. the supremum over all possible strategies in the optimization problem with two equivalent interpretations. In both cases set U represents a set of available actions. A DM can select one of them, say u, and then she obtains a reward a _u and, according to the first interpretation with probability b _u, the game is terminated, and with complimentary probability 1−b _u she is again in an initial situation, i.e. she can select any action. Her goal is to maximize the total (undiscounted) reward. According to the second interpretation, the game is continued sequentially without possibility of random termination, but the value 1−b _u is now not a probability, but a discount factor applied to the future rewards after an action u was used at the first step.

Problem 2: ratio (cycle) problem.:

Find

$$\alpha=\sup_{u\in U}\frac{a_{u}}{b_{u}}. $$

(42)

The interpretation of this problem is straightforward: a DM can select action u only once and her goal is to maximize the ratio in (42), the one step reward per “chance of termination”. Since the game is terminated after the first trial anyway, 1/b _u has an interpretation of a “multiplicator” applied to a “direct” reward a _u.

Let H(k) be a function defined in the right side of (41).

Problem 3: a parametric family of retirement problems.:

Find w, defined as follows: given parameter k,−∞<k<∞, let

$$v(k)=k\vee H(k),\qquad w=\inf\bigl\{k:v(k)=k\bigr\}. $$

(43)

In this problem, given number k, a DM has the following one step choice: to obtain k immediately, or to select an action u once and then to obtain a reward a _u and after that additionally with probability 1−b _u to obtain k, and with complimentary probability to obtain zero.

Using the fact that functions H(k) and v(k),−∞<k<∞, are nondecreasing, continuous, and convex (concave up), the following theorem was proved in Sonin (2011).

Theorem 5

(Abstract optimization equality)

(a)
Solution h of (41) is finite and unique;
(b)
h=α=w;
(c)
the optimal index, or an optimizing sequence for any of the three problems is the optimal index (an optimizing sequence) for the other two problems.

See the brief discussion of one more problem initially analyzed in one page seminal paper Mitten (1960), and its relation to the GI and GGI in Sonin (2008).

In the next section we shall formally show how the three problems described in Sect. 3 can be presented as abstract problems.

1.2 5.2 Proofs of some statements

Proof of Proposition 3

For any set S⊂X by definition of τ=τ _S and definition of function g(x|k) we have

(44)

Using the definitions of R ^π(x) and Q ^π(x) in (26) and (25), the definition of function g(x|k) and the definition of a partition π ₀, we can see that for all partitions π with the same set S we have $C^{\pi}(x)=C^{\pi_{0}}(x)$, $R_{q}^{\pi }(x)+R_{r}^{\pi}(x)+Q_{r}^{\pi}(x)k\leq R_{q}^{\pi_{0}}(x)+R_{r}^{\pi _{0}}(x)+Q_{r}^{\pi_{0}}(x)k$. Hence R ^π(x)+(1−Q ^π(x))k≤ $R^{\pi_{0}}(x)+(1-Q^{\pi_{0}}(x))k$. Thus sup_π>0[…]=sup_τ>0[…], where [..] is the expression in square brackets in (36). □

Proof of Theorem 3, using Theorem 4

Given x∈X let us introduce the set of indices U={u}={π>0}∪{0}, where π>0 is any 3-partition such that x∈C and 0 is an index such that a ₀=q(x)−r(x),b ₀=1. For any u≠0, i.e. for a partition π=(C,S _q,S _r), S=S _q∪S _r, and τ=τ _S, we define $R^{\pi}(x)=E_{x}\sum_{n=0}^{\tau-1}c(Z_{n})$, the total expected reward until moment τ, and Q(x)=P _x(Z _τ∈S _q), the probability of termination on [0,τ). The rewards a _u and probabilities b _u we define as a _u=R ^π(x)−r(x), b _u=Q ^π(x). Then function H(k) coincides with sup_τ>0 E _x g(Z _τ), where g(x)=k. Respectively v(x|k)=k∨H(k)=sup_τ≥0 E _x g(Z _τ), i.e. v(x|k) is the value function in an OS for MC in model M(k). Then according to formulas (29), (30) and (33) and (34) three Problems 1, 2, 3 from Sect. 4 are represented as three abstract problems in this section and we can apply Theorem 4. □

Proof of formula (15)

Note that if D=∅ then W _∅=P _∅=P and if D={y} then p _D╲y(⋅,y)=p(⋅,y) and formula (14) with w′=w _D and w=p for z=y implies that

$$w_{D}(\cdot,y)=p(\cdot,y)+p(\cdot,y)\frac{p(y,y)}{1-p(y,y)}=\frac {p(\cdot,y)}{1-p(y,y)},$$

(45)

so formula (15) holds. It remains to prove that if it holds for a set D⊂X, then it will hold for a set D∪z, z∉D. For D⊂X let us define matrix S _D as follows: s _D(⋅,y)=w _D(⋅,y)=p _D(⋅,y) for y∉D and s _D(⋅,y)=p _D╲y(⋅,y) for y∈D. To simplify notation let us denote matrices W _D=W ₁,W _D∪z=W ₂,S _D=S ₁,S _D∪z=S ₂. If formula (15) holds for a set D⊂X, then two equivalent statements are true

$$w_{1}(\cdot,y)=\frac{s_{1}(\cdot,y).}{1-s_{1}(y,y)},\qquad s_{1}(\cdot,y)=\frac {w_{1}(\cdot,y).}{1+w_{1}(y,y)},\quad y\in D. $$

(46)

The first formula is our assumption and the second formula follows from the first one if we apply the first formula for x=y to obtain the equality 1−s ₁(y,y)=1/(1+w ₁(y,y)). Our goal is to prove

$$w_{2}(\cdot,y)=\frac{s_{2}(\cdot,y).}{1-s_{2}(y,y)},y\in D\cup z. $$

(47)

For y=z formula (47) can be checked directly: formula (14) for y=z implies w ₂(⋅,z)=w ₁(⋅,z)/(1−w ₁(z,z)) and for z∉D by definition of S ₂ we have s ₂(⋅,z)=s ₁(⋅,z), and by definition of S ₁ we have s ₁(⋅,z)=w ₁(⋅,z). For y∈D by definition of W ₂, i.e. by formula (14), using formula (46) and the equality w ₁(⋅,z)=s ₁(⋅,z) (since z∉D) we have

(48)

By definition of S ₂=S _D∪z, using formula (14) with w′=p _(D∪z)╲y and w=p _D╲y for y∈D we have

$$s_{2}(\cdot,y)=s_{1}(\cdot,y)+p_{D\diagdown y}(\cdot ,z)s_{1}(z,y)A, $$

(49)

where A=1/(1−p _D╲y(z,z)). Using formula (14) with w′=p _D and w=p _D╲y, we have

$$p_{D}(\cdot,z)=p_{D\diagdown y}(\cdot,z)+p_{D\diagdown y}(\cdot,y)\frac {p_{D\diagdown y}(y,z)}{1-p_{D\diagdown y}(y,y)},\quad y\in D,\ z\notin D.$$

Applying this formula for x=y, we obtain $\frac{p_{D\diagdown y}(y,z)}{1-p_{D\diagdown y}(y,y)}=p_{D}(y,z)=s_{D}(y,z)=s_{1}(y,z)$ and hence

$$p_{D\diagdown y}(\cdot,z)=s_{1}(\cdot,z)-s_{1}(\cdot,y)s_{1}(y,z), $$

(50)

Using formula (50) for x=z, we obtain p _D╲y(z,z)=s ₁(z,z)−s ₁(z,y)s ₁(y,z), so

$$1-s_{1}(y,z)s_{1}(z,y)A=\bigl(1-s_{1}(z,z)\bigr)A. $$

(51)

Now using this equality and formula (50), formula (49) can be written as

(52)

Using x=y in (52), we obtain s ₂(y,y)=s ₁(y,y)(1−s ₁(z,z))A+s ₁(y,z)s ₁(z,y)A. Then, using formula (51), we obtain 1−s ₂(y,y)=(1−s ₁(y,y))(1−s ₁(z,z))A and dividing both part of formula (52) by 1−s ₂(y,y), we obtain that (48) for y∈D can be written as (47). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sonin, I.M., Steinberg, C. Continue, quit, restart probability model. Ann Oper Res 241, 295–318 (2016). https://doi.org/10.1007/s10479-012-1089-2

Download citation

Published: 25 February 2012
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10479-012-1089-2