Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Continue, quit, restart probability model

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

We discuss a new applied probability model: there is a system whose evolution is described by a Markov chain (MC) with known transition matrix on a discrete state space and at each moment of a discrete time a decision maker can apply one of three possible actions: continue, quit, and restart MC in one of a finite number of fixed “restarting” points. Such a model is a generalization of a model due to Katehakis and Veinott (Math. Oper. Res. 12:262, 1987), where a restart to a unique point was allowed without any fee and quit action was absent. Both models are related to Gittins index and to another index defined in a Whittle family of stopping retirement problems. We propose a transparent recursive finite algorithm to solve our model by performing O(n 3) operations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2

Similar content being viewed by others

References

  • Denardo, E. V., Rothblum, U. G., & Van der Heyden, L. (2004). Index policies for stochastic search in a forest with an application to R&D project management. Mathematics of Operations Research, 29, 162–181. doi:10.1287/moor.1030.0072.

    Article  Google Scholar 

  • Derman, C. (1970). Finite state Markovian decision processes. Orlando: Academic Press.

    Google Scholar 

  • Feinberg, E. A., & Shwartz, A. (2002). Handbook of Markov decision processes. International series in operations research & management science (Vol. 40). Boston: Kluwer Academic.

    Google Scholar 

  • Grassmann, W. K., Taksar, M. I., & Heyman, D. P. (1985). Regenerative analysis and steady state distributions for Markov chains. Operations Research, 33, 1107–1116. doi:10.1287/opre.33.5.1107.

    Article  Google Scholar 

  • Irle, A. (1980). On the best choice problem with random population size. Mathematical Methods of Operations Research, 24, 177–190. doi:10.1007/BF01919245.

    Article  Google Scholar 

  • Katehakis, M., & Derman, C. (1986). Computing optimal sequential allocation rules in clinical trials, Lecture notes—monograph series (Vol. 8, pp. 29–39).

    Google Scholar 

  • Katehakis, M. N., & Veinott, A. F. (1987). The multi-armed bandit problem: decomposition and computation. Mathematics of Operations Research, 12, 262–268. doi:10.1287/moor.12.2.262.

    Article  Google Scholar 

  • Kemeny, J. G., Snell, J. L., & Knapp, A. W. (1976). Finite Markov chains. Princeton: Springer.

    Google Scholar 

  • Mitten, L. G. (1960). An analytic solution to the least cost testing sequence problem. International Journal of Industrial Engineering, 11, 17.

    Google Scholar 

  • Puterman, M. (2005). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley.

    Google Scholar 

  • Sheskin, T. J. (1985). Technical note—a Markov chain partitioning algorithm for computing steady state probabilities. Operations Research, 33, 228–235. doi:10.1007/s10479-012-1089-2.

    Article  Google Scholar 

  • Sheskin, T. J. (1999). State reduction in a Markov decision process. International Journal of Mathematical Education in Science and Technology, 30(2), 167–185.

    Article  Google Scholar 

  • Sonin, I. (2006). The optimal stopping of a Markov chain and recursive solution of Poisson and Bellman equations. In Y. Kabanov, R. Liptser, & J. Stoyanov (Eds.), From stochastic calculus to mathematical finance. The Shiryaev Festschrift (pp. 609–621). Berlin: Springer.

    Chapter  Google Scholar 

  • Sonin, I. (2008). A generalized Gittins index for a Markov chain and its recursive calculation. Statistics & Probability Letters, 78(12), 1526–1533. doi:10.1016/j.spl.2008.01.049.

    Article  Google Scholar 

  • Sonin, I. (1999a). The elimination algorithm for the problem of optimal stopping. Mathematical Methods of Operational Research, 49(1), 111–123. doi:10.1007/s001860050016.

    Google Scholar 

  • Sonin, I. (1999b). The state reduction and related algorithms and their applications to the study of Markov chains, graph theory, and the optimal stopping problem. Advances in Mathematics, 145, 159–188. doi:10.1006/aima.1998.1813.

    Article  Google Scholar 

  • Sonin, I. M. (2011). Optimal stopping of Markov chains and three abstract optimization problems. Stochastics, 83(4–6), 405–414. doi:10.1080/17442508.2010.514051.

    Google Scholar 

  • Sonin, I., & Presman, E. (1972). The problem of best choice in the case of a random number of objects. Theory of Probability and Its Applications, 17, 695–706.

    Google Scholar 

  • Sonin, I., & Thornton, J. (2001). Recursive algorithm for the fundamental/group inverse matrix of a Markov chain from an explicit formula. SIAM Journal on Matrix Analysis and Applications, 23(1), 209–224. doi:10.1137/S0895479899351234.

    Article  Google Scholar 

  • Stewart, W. (2009). Probability, Markov chains, queues, and simulation: the mathematical basis of performance modeling. Princeton: Princeton University Press.

    Google Scholar 

  • Tsitsiklis, J. N. (1994). A short proof of the Gittins index theorem. The Annals of Applied Probability, 4, 194–199. doi:10.1214/aoap/1177005207.

    Article  Google Scholar 

  • Whittle, P. (1980). Multi-armed bandits and the Gittins index. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 42(2), 143–149.

    Google Scholar 

  • Ye, Y. (2011). The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Mathematics of Operations Research, 36(4), 593–603. doi:10.1287/moor.1110.0516.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Constantine Steinberg.

Appendix

Appendix

1.1 5.1 Three abstract optimization problems

The common part of all three problems described above in Sect. 3 is a maximization over the set of all partitions of the state set X into three sets: continuation, restart and quit regions. This is a special case of a very general situation of optimization on an abstract index set.

Let us consider the following three abstract optimization problems 1, 2 and 3. Suppose there is an abstract index set U, and let A={a u } and B={b u } be two sets of real numbers indexed by the elements of U. Suppose that an assumption U holds,

$$-\infty<a_{u}\leq a<\infty,\qquad 0<b\leq b_{u}\leq1. $$
(U)

In all three problems a DM knows sets U,A and B.

Problem 1: restart problem.:

Find solution(s) of the equation

$$h=\sup_{u\in U}\bigl[a_{u}+(1-b_{u})h\bigr]\equiv H(h). $$
(41)

Equation (41) is a Bellman (optimality) equation for the “value of the game”, i.e. the supremum over all possible strategies in the optimization problem with two equivalent interpretations. In both cases set U represents a set of available actions. A DM can select one of them, say u, and then she obtains a reward a u and, according to the first interpretation with probability b u , the game is terminated, and with complimentary probability 1−b u she is again in an initial situation, i.e. she can select any action. Her goal is to maximize the total (undiscounted) reward. According to the second interpretation, the game is continued sequentially without possibility of random termination, but the value 1−b u is now not a probability, but a discount factor applied to the future rewards after an action u was used at the first step.

Problem 2: ratio (cycle) problem.:

Find

$$\alpha=\sup_{u\in U}\frac{a_{u}}{b_{u}}. $$
(42)

The interpretation of this problem is straightforward: a DM can select action u only once and her goal is to maximize the ratio in (42), the one step reward per “chance of termination”. Since the game is terminated after the first trial anyway, 1/b u has an interpretation of a “multiplicator” applied to a “direct” reward a u .

Let H(k) be a function defined in the right side of (41).

Problem 3: a parametric family of retirement problems.:

Find w, defined as follows: given parameter k,−∞<k<∞, let

$$v(k)=k\vee H(k),\qquad w=\inf\bigl\{k:v(k)=k\bigr\}. $$
(43)

In this problem, given number k, a DM has the following one step choice: to obtain k immediately, or to select an action u once and then to obtain a reward a u and after that additionally with probability 1−b u to obtain k, and with complimentary probability to obtain zero.

Using the fact that functions H(k) and v(k),−∞<k<∞, are nondecreasing, continuous, and convex (concave up), the following theorem was proved in Sonin (2011).

Theorem 5

(Abstract optimization equality)

  1. (a)

    Solution h of (41) is finite and unique;

  2. (b)

    h=α=w;

  3. (c)

    the optimal index, or an optimizing sequence for any of the three problems is the optimal index (an optimizing sequence) for the other two problems.

See the brief discussion of one more problem initially analyzed in one page seminal paper Mitten (1960), and its relation to the GI and GGI in Sonin (2008).

In the next section we shall formally show how the three problems described in Sect. 3 can be presented as abstract problems.

1.2 5.2 Proofs of some statements

Proof of Proposition 3

For any set SX by definition of τ=τ S and definition of function g(x|k) we have

(44)

Using the definitions of R π(x) and Q π(x) in (26) and (25), the definition of function g(x|k) and the definition of a partition π 0, we can see that for all partitions π with the same set S we have \(C^{\pi}(x)=C^{\pi_{0}}(x)\), \(R_{q}^{\pi }(x)+R_{r}^{\pi}(x)+Q_{r}^{\pi}(x)k\leq R_{q}^{\pi_{0}}(x)+R_{r}^{\pi _{0}}(x)+Q_{r}^{\pi_{0}}(x)k\). Hence R π(x)+(1−Q π(x))k\(R^{\pi_{0}}(x)+(1-Q^{\pi_{0}}(x))k\). Thus sup π>0[…]=sup τ>0[…], where [..] is the expression in square brackets in (36). □

Proof of Theorem 3, using Theorem 4

Given xX let us introduce the set of indices U={u}={π>0}∪{0}, where π>0 is any 3-partition such that xC and 0 is an index such that a 0=q(x)−r(x),b 0=1. For any u≠0, i.e. for a partition π=(C,S q ,S r ), S=S q S r , and τ=τ S , we define \(R^{\pi}(x)=E_{x}\sum_{n=0}^{\tau-1}c(Z_{n})\), the total expected reward until moment τ, and Q(x)=P x (Z τ S q ), the probability of termination on [0,τ). The rewards a u and probabilities b u we define as a u =R π(x)−r(x), b u =Q π(x). Then function H(k) coincides with sup τ>0 E x g(Z τ ), where g(x)=k. Respectively v(x|k)=kH(k)=sup τ≥0 E x g(Z τ ), i.e. v(x|k) is the value function in an OS for MC in model M(k). Then according to formulas (29), (30) and (33) and (34) three Problems 1, 2, 3 from Sect. 4 are represented as three abstract problems in this section and we can apply Theorem 4. □

Proof of formula (15)

Note that if D=∅ then W =P =P and if D={y} then p Dy (⋅,y)=p(⋅,y) and formula (14) with w′=w D and w=p for z=y implies that

$$w_{D}(\cdot,y)=p(\cdot,y)+p(\cdot,y)\frac{p(y,y)}{1-p(y,y)}=\frac {p(\cdot,y)}{1-p(y,y)},$$
(45)

so formula (15) holds. It remains to prove that if it holds for a set DX, then it will hold for a set Dz, zD. For DX let us define matrix S D as follows: s D (⋅,y)=w D (⋅,y)=p D (⋅,y) for yD and s D (⋅,y)=p Dy (⋅,y) for yD. To simplify notation let us denote matrices W D =W 1,W Dz =W 2,S D =S 1,S Dz =S 2. If formula (15) holds for a set DX, then two equivalent statements are true

$$w_{1}(\cdot,y)=\frac{s_{1}(\cdot,y).}{1-s_{1}(y,y)},\qquad s_{1}(\cdot,y)=\frac {w_{1}(\cdot,y).}{1+w_{1}(y,y)},\quad y\in D. $$
(46)

The first formula is our assumption and the second formula follows from the first one if we apply the first formula for x=y to obtain the equality 1−s 1(y,y)=1/(1+w 1(y,y)). Our goal is to prove

$$w_{2}(\cdot,y)=\frac{s_{2}(\cdot,y).}{1-s_{2}(y,y)},y\in D\cup z. $$
(47)

For y=z formula (47) can be checked directly: formula (14) for y=z implies w 2(⋅,z)=w 1(⋅,z)/(1−w 1(z,z)) and for zD by definition of S 2 we have s 2(⋅,z)=s 1(⋅,z), and by definition of S 1 we have s 1(⋅,z)=w 1(⋅,z). For yD by definition of W 2, i.e. by formula (14), using formula (46) and the equality w 1(⋅,z)=s 1(⋅,z) (since zD) we have

(48)

By definition of S 2=S Dz , using formula (14) with w′=p (Dz)╲y and w=p Dy for yD we have

$$s_{2}(\cdot,y)=s_{1}(\cdot,y)+p_{D\diagdown y}(\cdot ,z)s_{1}(z,y)A, $$
(49)

where A=1/(1−p Dy (z,z)). Using formula (14) with w′=p D and w=p Dy , we have

$$p_{D}(\cdot,z)=p_{D\diagdown y}(\cdot,z)+p_{D\diagdown y}(\cdot,y)\frac {p_{D\diagdown y}(y,z)}{1-p_{D\diagdown y}(y,y)},\quad y\in D,\ z\notin D.$$

Applying this formula for x=y, we obtain \(\frac{p_{D\diagdown y}(y,z)}{1-p_{D\diagdown y}(y,y)}=p_{D}(y,z)=s_{D}(y,z)=s_{1}(y,z)\) and hence

$$p_{D\diagdown y}(\cdot,z)=s_{1}(\cdot,z)-s_{1}(\cdot,y)s_{1}(y,z), $$
(50)

Using formula (50) for x=z, we obtain p Dy (z,z)=s 1(z,z)−s 1(z,y)s 1(y,z), so

$$1-s_{1}(y,z)s_{1}(z,y)A=\bigl(1-s_{1}(z,z)\bigr)A. $$
(51)

Now using this equality and formula (50), formula (49) can be written as

(52)

Using x=y in (52), we obtain s 2(y,y)=s 1(y,y)(1−s 1(z,z))A+s 1(y,z)s 1(z,y)A. Then, using formula (51), we obtain 1−s 2(y,y)=(1−s 1(y,y))(1−s 1(z,z))A and dividing both part of formula (52) by 1−s 2(y,y), we obtain that (48) for yD can be written as (47). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sonin, I.M., Steinberg, C. Continue, quit, restart probability model. Ann Oper Res 241, 295–318 (2016). https://doi.org/10.1007/s10479-012-1089-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-012-1089-2

Keywords

Navigation