Dynamic Games with Strategic Complements and Large Number of Players

1909 Accesses
1 Altmetric
Explore all metrics

Abstract

We study dynamic games with strategic complements where each player is modeled by a scalar flow dynamical system with a controlled input and an uncontrolled output. The model originates in inventory control problems with shared set-up costs and a large number of players. An activation cost is shared among active players, namely players who control their dynamics at a given time. As a main contribution, we prove that two-threshold strategies, like the (s, S) strategies used in inventory control, are mean-field equilibrium strategies in dynamic games with a large number of players. Furthermore, we provide conditions for the convergence of the nonstationary mean-field equilibrium to the stationary one in the limit.

Dynamic Coordination Games with Activation Costs

Article Open access 09 January 2021

Zero-sum stochastic games with the average-value-at-risk criterion

Article 10 April 2023

Zero-Sum Average Cost Semi-Markov Games with Weakly Continuous Transition Probabilities and a Minimax Semi-Markov Inventory Problem

Article 11 February 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Games with strategic complements are characterized by the property that a player has an increasing incentive to take a given action as more neighbors take that same action [15, Chapter 9]. Examples of such games, though sometimes not explicitly mentioned, arise in learning in social networks [11], collective behavior in social networks [12], systemic risk [6], and cascading failures in financial networks [8, 18]. Coordination games represent a subset of games with strategic complements whereby the payoff of a player scales with the percentage of players taking an action. This paper studies a dynamic game with strategic complements where the players have to coordinate actions within a finite horizon window [2, 3, 19]. The dynamics of each player is a fluid flow dynamical system subject to a controlled input flow and a stochastic uncontrolled output flow. Activating an input flow requires an activation cost. The discrepancy between input and output flow accumulates in a state variable. Coupling derives from the activation cost to be shared among all players who activate an input flow at a given time, called active players. Sharing the activation cost determines an incentive for the players to be active with an increasing number of active players. All results can be extended to the vector case by using the robust decomposition approach in [4, Section 3].

We extend the analysis in [19] to a mean-field scenario [1, 9, 10, 13, 14, 16, 17] characterized by a microscopic and macroscopic dynamics. The microscopic dynamics is the fluid flow system determining the state of each player. The optimal control is obtained from solving a backward Bellman equation in the value function. The macroscopic dynamics is in the form of a Markov chain dynamics where the nodes represent all possible values for the players’ states, and the links are weighted by the transition probabilities between states. The Markov chain dynamics determines the evolution of the distribution of players’ states among the different values. The resulting game involves both the microscopic and macroscopic dynamics in a unified framework and takes the form of a discrete-state discrete-time mean-field game. Such a game consists of two coupled difference equations, a backward Bellman equation in the value function, and a forward Markov dynamics in the distribution of the players’ states. The mean-field equilibrium is obtained as solution of these two coupled equations. The stationary solution is obtained in the asymptotic limit when the horizon length goes to infinity.

Contribution This study contributes in different ways to advance the theory on dynamic coordination games with activation costs and extend for the first time the use of two-threshold strategies to mean-field games. An example of two-threshold strategy is the (s, S) strategy used in inventory control, see [7] and [5, Chapter 4]. In [5], the author derives the thresholds of the (s, S) policy for an individual player considering a fixed cost. In this work, we present the explicit expression for these thresholds considering a large number of players and an activation cost that depends on the fraction of active players at each time t. We recall that (s, S) strategies are strategies where replenishments occur anytime the inventory level goes below a lower threshold s. Replenishments bring back the inventory level up to a higher threshold S. In particular, we highlight the following results:

Strategies at a Nash equilibrium have a threshold structure. Lower and upper thresholds have an explicit expression in the deterministic case, namely when the demand is known, or in single-stage games.
Two-threshold (s, S) strategies are mean-field equilibrium strategies for the stationary solution in dynamic games with a large number of players. Stationary solutions imply that the fixed cost is constant over the horizon. The game decomposes into a set of uncoupled optimization problems. In each problem, a single player has to find the optimal strategy under a fixed cost. We then use the well-known optimality of (s, S) strategies under fixed cost to show that such strategies are best responses for the game. Furthermore, we provide conditions for the convergence of the nonstationary mean-field equilibrium to the stationary one in the limit.
We corroborate our results with a numerical analysis of a stylized inventory model.

This paper is organized as follows. In Sect. 2, we introduce the model. In Sect. 3, we obtain the optimal thresholds. In Sect. 4, we study convergence to stationary solutions. In Sect. 5, we provide numerical analysis. Finally, in Sect. 6, we draw conclusions and discuss future works.

2 Mean-Field Inventory Game

We consider a large number of indistinguishable players and a finite number of states (inventory levels). Let us assume that at stage $t=0,1,...,N$ the inventory level for an individual player is $x^t\in {\mathbb {Z}}$, the player faces a stochastic demand $\omega ^t \in {\mathbb {Z}}_+$ and orders a quantity $u^t\in U^t\subseteq {\mathbb {Z}}_+$, where $U^t$ denotes the set of admissible actions, ${\mathbb {Z}}$ is the set of integers, and ${\mathbb {Z}}_+$ is the set of nonnegative integers. Hence, the microscopic dynamics of the player evolves according to a linear finite-state, discrete-time model:

$$\begin{aligned} x^{t+1}=x^t+u^t-\omega ^t, \quad \hbox { for all}\ t=0,1,\ldots ,N. \end{aligned}$$

(1)

According to [5] in (s, S) strategies, replenishments occur anytime the inventory level goes below a lower threshold s and when a replenishment takes place it brings back the inventory level up to the upper threshold S [7]. In accordance with this strategy, let us define the control $u^t$ as follows:

$$\begin{aligned} u^t:=\mu (x^t):=\left\{ \begin{array}{cc} S-x^t, &{} \qquad \text{ if } \quad x^t < s,\\ 0, &{} \qquad \text{ if } \quad x^t \ge s, \end{array}\right. \quad \hbox { for all}\ t=0,1,\ldots ,N. \end{aligned}$$

(2)

After substituting the (s, S) strategy as defined in (2) in the dynamics (1), we obtain

$$\begin{aligned} x^{t+1}=\left\{ \begin{array}{lll} S-\omega ^t, &{} \qquad \text{ if } \quad x^t < s,\\ x^t-\omega ^t, &{} \qquad \text{ if } \quad x^t \ge s, \end{array} \right. \quad \hbox { for all}\ t=0,1,\ldots ,N. \end{aligned}$$

(3)

To define the random parameter $\omega ^t$ that corresponds to the uncertain demand at time t, let us consider a probability distribution $\phi ^t:{\mathbb {Z}}_+\rightarrow [0,1]$ such that $\omega \mapsto \phi ^t_\omega $; here, $\phi ^t_\omega $ is the probability of having a demand of $\omega $ items at time t for all $\omega \in {\mathbb {Z}}_+$.

To derive a macroscopic dynamics for the system, let us denote by $\pi ^t$ the distribution of players over the states at time t. Hence, $\pi ^t$ is a vector that stores in each of its entries the fraction of players in each possible state. In particular, the jth entry $\pi _j^t$ represents the fraction of players whose state is $x^t=j$ at time t and derives from the following distribution function:

$$\begin{aligned} \pi _j^t:{\mathbb {Z}} \rightarrow [0,1], \quad j \mapsto \pi _j^t \in [0,1]. \end{aligned}$$

Occasionally, we will view $\pi ^t$ as an infinite-dimensional vector in ${\mathbb {Z}}$. Also, let $\pi ^0$ be the initial distribution of players over the states.

At every time step t, the players in state l decide the amount to reorder $u^t$. The order quantity, as well as the demand distribution $\omega ^t$, determines the transition probability $P_{lj}^t$ from state l to state j. Given the transition probabilities $P_{lj}^t$ at time $0\le t <N$, the distribution of players at time $t+1$ is given by the following macroscopic model which takes the form of a Markov chain:

$$\begin{aligned} \pi _j^{t+1} = \sum _{l \in {\mathbb {Z}}} \pi _l^t P_{lj}^t, \text{ for } \text{ all } j \in {\mathbb {Z}}, \quad \hbox { for all}\ t=0,1,\ldots ,N. \end{aligned}$$

(4)

The transition probabilities $P_{lj}^t$ used in the above equation are linked to the probability mass functions used to model the stochastic demand. To see this, let $\phi ^t_0, \, \phi ^t_1, \, \phi _2^t, \ldots $ be the probability mass functions at time t associated with $\omega ^t = 0,1,2 \ldots $, respectively. The relation between $P_{lj}^t$ and $\phi ^t_0, \, \phi ^t_1, \, \phi ^t_2, \ldots $ is as follows:

$$\begin{aligned} \begin{array}{ll} [\,\ldots \, P_{l,S-2}^t \, P^t_{l,S-1} \, P^t_{l,S}]= [\,\ldots \,\phi _2^t \,\quad \phi ^t_1 \,\quad \phi ^t_0], \quad l < s. \end{array} \end{aligned}$$

(5)

The above equation defines the transition probabilities from any state below the threshold, where the players reorder up to level S. For any state equal to or greater than the threshold s, the transition probabilities are instead given by:

$$\begin{aligned} \begin{array}{ll} [\,\ldots \, P^t_{l,l-2} \, P^t_{l,l-1} \, P^t_{l,l}]= [\, \ldots \, \phi ^t_2 \,\quad \phi ^t_1 \,\quad \phi ^t_0], \quad l \ge s. \end{array} \end{aligned}$$

(6)

Figure 1 depicts the Markov chain that represents the macroscopic dynamics (4). In the mean-field context, the fraction of active players, which are the players whose inventory level is below or equal to the lower threshold $s^t$, is then given by:

$$\begin{aligned} a^t = \sum _{l,l< s^t} \pi _l^t, \quad \text{ for } \text{ all } t=0,1,\ldots ,N. \end{aligned}$$

(7)

Likewise, we can define a value function for any time t which represents the expected optimal cost for a player in the generic state j at time t:

$$\begin{aligned} v^t: {\mathbb {Z}} \rightarrow {\mathbb {R}}_+, \quad j \mapsto v_j^t \in {\mathbb {R}}_+. \end{aligned}$$

Let the transition probability matrix at time t be denoted by $P^t=[P^t_{lj}]_{l,j\in {\mathbb {Z}}}$. Associated with each probability $P^t_{lj}$, there is a transition cost for going from state l to state j, which depends also on the distribution of players $\pi ^t$; let us denote such cost as $c_{lj}^t(\pi ^t,P^t)$.

The average cost for the players in state l, when their dynamics follow the transition probability matrix $P^t$, for a given distribution $\pi ^t$ and the future cost defined by the value function $v_j^{t+1}$, for all $j\in {\mathbb {Z}}$, are given by:

$$\begin{aligned} e_l^t(\pi ^t,P^t,v^{t+1}) = \sum _{j \in {\mathbb {Z}}} \left[ c_{lj}(\pi ^t,P^t) P^t_{lj} + v^{t+1}_j P^t_{lj}\right] , \quad \hbox { for all}\ t=0,1,\ldots ,N. \end{aligned}$$

We are in the position to provide the following definition of Nash equilibrium in the mean-field limit, in discrete-time, and in discrete-state space.

Definition 2.1

(Definition 1 in [9]) Let $ {\mathbb {S}}^{{\mathbb {Z}}}$ denote the simplex in ${\mathbb {Z}}$. Fix a probability vector $\pi \in {\mathbb {S}}^{{\mathbb {Z}}}$ and a cost vector $v \in {\mathbb {R}}^{{\mathbb {Z}}}$. A stochastic matrix $P \in [0,1]^{{\mathbb {Z}} \times {\mathbb {Z}}}$ is a Nash minimizer of $e(\pi ,\cdot ,v)$ if for each $l \in {\mathbb {Z}}$ and any $q \in [0,1]^{\mathbb {Z}}$,

$$\begin{aligned} e_l(\pi ,P,v)\le e_l(\pi ,{\mathcal {P}} (P,q,l),v), \end{aligned}$$

where ${\mathcal {P}} (P,q,l)$ is obtained from matrix P by replacing the lth row by $q \in {\mathbb {S}}^{{\mathbb {Z}}}$.

We say that the following pair of time-varying distribution and value function

$$\begin{aligned} \{(\pi ^t,v^t); \, 0 \le t \le N\} \end{aligned}$$

is a mean-field equilibrium if it is the solution of the following system of equations for all $t=0,1,\ldots , N$:

$$\begin{aligned} \left\{ \begin{array}{ll} v_l^t = \sum _{j} \left[ c_{lj}(\pi ^t,P^t) P_{lj}^t + v_j^{t+1} P_{lj}^t\right] , &{}\hbox { }\ \forall j \in {\mathbb {Z}},\\ \pi _j^{t+1} = \sum _{l} \pi _l^t P_{lj}^t, &{} \forall j \in {\mathbb {Z}}, \end{array} \right. \end{aligned}$$

(8)

where $P^t$ is a Nash minimizer of $e(\pi ^t,\cdot ,v^{t+1})$.

In the above set of equations, we set the transition cost $c^t_{lj}=c_{lj}(\pi ^t,P^t)$ at time t as follows:

$$\begin{aligned} \left\{ \begin{array}{lll} K^t + r (S-l) + p \max (0,-j) +h \max (0,j), &{}\hbox { if}\ l<s \\ p \max (0,-j)+h \max (0,j), &{}\text{ otherwise }, \end{array} \right. \end{aligned}$$

(9)

where $K^t:=K(a^t)\ge 0$ is the transportation cost charged to each player that is active at time t, $r\ge 0$ is the fixed purchase cost per stock unit, $h\ge 0$ the fixed penalty on holding and $p>h\ge 0$ the fixed penalty on shortage.

The above transition cost can be rewritten in compact form as:

$$\begin{aligned} c^t_{lj} = \Big (K^t + r (S-l) \Big ) \delta (l<s) + p \max (0,-j) +h \max (0,j), \end{aligned}$$

where

$$\begin{aligned} \delta (l<s) = \left\{ \begin{array}{lll} 1, &{} \text{ if } l<s\text{, } \\ 0, &{} \text{ otherwise }. \end{array} \right. \end{aligned}$$

(10)

Note that the transportation cost $K^t=K(a^t)$ paid by each player is a monotonically decreasing function on the fraction of active players at time t. As the fraction of active players $a^t$ increases, the transportation cost $K^t$ decreases. If a player makes an order, it incentivizes other players to reorder; this implies that the cost of one player also depends on the actions of the other players. Let us assume a large number of players M and a total transportation cost ${\tilde{K}}$. As an example, if the total cost is equally divided among the active players, the individual transportation cost charged to each player is given by $K(a^t)=\frac{{\tilde{K}}}{Ma^t}$ if the player is active, and it is zero otherwise.

3 Optimal Thresholds

In this section, we provide explicit expressions to obtain the lower threshold s and the upper threshold S, as a function of the probability distribution function $\phi ^t$ which determines the stochastic demand at each time t.

Let us denote by $y^t=x^t+u^t$, the instantaneous inventory position, i.e., the inventory level just after the order has been issued, and let us define the following stage cost function:

$$\begin{aligned} \begin{array}{lll} G^{t}(y^{t})= ry^t+p {\mathbb {E}} \{ \max (0,-(y^t-\omega ^t))\} + h {\mathbb {E}} \{ \max (0,y^t-\omega ^t) \}. \end{array} \end{aligned}$$

(11)

Then, we have for the value function:

$$\begin{aligned} \begin{array}{lll} v_x^t =-rx^t+\min _{y^t\ge x^t}[K^t+G^t(y^t),G^t(x^t)], \end{array} \end{aligned}$$

(12)

where the term $-rx^t+K^t+G^t(y^t)$ indicates the stage cost in case of reordering, and $-rx^t+G^t(x^t)$ indicates the stage cost in case of no reordering. Hence, note that the cost of reordering is given by:

$$\begin{aligned} \begin{array}{ll} K^t -rx^t +&{}G^t(y^t)= K^t + ru^t +p {\mathbb {E}} \{ \max (0,-(y^t-\omega ^t))\} + h {\mathbb {E}} \{ \max (0,y^t -\omega ^t) \} \\ &{}=K^t + r (y^t - x^t )+p {\mathbb {E}} \{ \max (0,-(y^t -\omega ^t))\} + h {\mathbb {E}} \{ \max (0,y^t-\omega ^t) \}. \\ \end{array} \end{aligned}$$

To obtain $S^t$, for an instantaneous inventory position $\gamma $, first let us define the expected holding ${\mathbb {E}} \{\max (0,\gamma -\omega ^t)\}$ and expected shortage $\mathbb E\{\max (0,-(\gamma -\omega ^t))\}$ as follows:

$$\begin{aligned} {\mathbb {E}} \{\max (0,\gamma -\omega ^t)\}=\varPsi ^t_h[\gamma ]:=\sum _{\omega =0}^\gamma (\gamma -\omega ) \phi ^t_{\omega }, \\ {\mathbb {E}}\{\max (0,-(\gamma -\omega ^t))\}=\varPsi ^t_s[\gamma ]:=\sum _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^t_{\omega }, \end{aligned}$$

where $\phi ^{t}_{\omega }$ is the probability of having a demand of $\omega $ items at time t.

Hence, the stage cost function $G^t(\gamma )$ is given by:

$$\begin{aligned} G^t(\gamma ) = r ( \gamma - x^t ) + h \underbrace{\sum _{\omega =0}^\gamma (\gamma -\omega ) \phi ^t_{\omega }}_{:=\varPsi ^t_h[\gamma ]} + p \underbrace{\sum _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^t_{\omega }}_{:=\varPsi ^t_s[\gamma ]}. \end{aligned}$$

By applying the discrete difference operator $\varDelta $, to function $G^t(\gamma )$ we then have:

$$\begin{aligned} \begin{array}{ll} \varDelta G^t(\gamma ) &{}:= G^t(\gamma +1) - G^t(\gamma ) \\ &{}= r( \gamma +1 - x^t) + h \sum _{\omega =0}^{\gamma +1}(\gamma +1 -\omega ) \phi ^t_{\omega } + p \sum _{\omega =\gamma +2}^\infty (\omega - \gamma -1) \phi ^t_{\omega } \\ &{}\quad - r ( \gamma - x^t) - h \sum _{\omega =0}^\gamma (\gamma -\omega ) \phi ^t_{\omega } - p \sum _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^t_{\omega }\\ &{}= r + h \sum _{\omega =0}^\gamma \phi ^t_\omega - p \sum _{\omega =\gamma +1}^\infty \phi ^t_{\omega }\\ &{}= r + h \varPhi ^t_\omega [\gamma ] - p (1 - \varPhi ^t_\omega [\gamma ]), \end{array} \end{aligned}$$

where $\varPhi ^t_\omega [\gamma ]$ is the cumulative distribution function defined as:

$$\begin{aligned} \varPhi ^t_\omega [\gamma ]:=\sum _{\omega =0}^\gamma \phi ^t_\omega . \end{aligned}$$

The order-up-to level $S^t$ is the optimal $\gamma $, which is obtained from solving:

$$\begin{aligned} \begin{array}{ll} \min _\gamma \, \{\gamma | \, \varDelta G^t(\gamma ) \ge 0\} = \min _\gamma \,\{ \gamma | \, r + h \varPhi ^t_\omega [\gamma ] - p (1 - \varPhi ^t_\omega [\gamma ]) \ge 0\}. \end{array} \end{aligned}$$

From the above, we then obtain (Fig. 2):

$$\begin{aligned} S^t = \arg \min _\gamma \Big \{\gamma | \, \varPhi ^t_\omega [\gamma ] \ge \frac{-r+ p }{h +p} \Big \}. \end{aligned}$$

(13)

To obtain $s^t$, let us consider the cost of not reordering, which is given by:

$$\begin{aligned} \begin{array}{ll} -rx^t+G^t(x^t) &{}= p {\mathbb {E}} \{ \max (0,-(x^t-\omega ^t))\} + h {\mathbb {E}} \{ \max (0,x^t -\omega ^t) \} \\ &{}=h \sum _{\omega =0}^{x^t} (x^t-\omega ) \phi ^t_{\omega } + p \sum _{\omega =x^t+1}^\infty (\omega - x^t) \phi ^t_{\omega }\\ &{}= h \varPsi ^t_h[\gamma ] + p \varPsi ^t_s[\gamma ]. \end{array} \end{aligned}$$

(14)

From the above, we then obtain:

$$\begin{aligned} s^t:= \arg \min _{x^t} \{x^t | \, -rx^t +G^t(x^t) \le K^t-rS^t+G^t(S^t)\}. \end{aligned}$$

In particular, we have (Fig. 3):

$$\begin{aligned} \begin{array}{ll} s^t:= \arg \min _{x^t} \Big \{x^t | \, h \varPsi ^t_h[x^t]+ p \varPsi ^t_s[x^t] \le K^t+r ( S^t - x^t) + h \varPsi ^t_h[S^t] + p \varPsi ^t_s[S^t] \Big \}. \end{array}\nonumber \\ \end{aligned}$$

(15)

Observe that the right-hand side of the inequality in (15) corresponds to the cost of reordering once we obtain the optimal upper threshold $S^t$.

In order to obtain the lower threshold $s^t$, we have to find the minimum inventory level $x^t$ that satisfies (15). As the penalty on shortage is greater than the penalty on holding ($p>h$), if the inventory level decreases, then the left-hand side of the inequality in (15) increases. If the transportation cost $K^t$ decreases, the right-hand side of the inequality decreases and the minimum inventory level $x^t$ that satisfies (15) increases. Therefore, the lower the transportation cost the higher the threshold $s^t$.

Equations (13) and (15) represent explicit expressions to obtain the two thresholds and fully characterize the reordering strategy once the probability distribution of the stochastic demand is given.

Once the thresholds are obtained, we implement the control $u^t$, which is given by (2), and we obtain the resulting dynamics (3).

In the following, we study the time evolution of the first-order moment of the inventories. The expected inventory at time t when $x^t$ is distributed according to $\pi ^t$ is given by:

$$\begin{aligned} {\mathbb {E}} x^{t} = \sum _{l} \pi _l^t l. \end{aligned}$$

Then, from (3) the expected inventory at time $t+1$ when $x^{t+1}$ is distributed according to $\pi ^{t+1}$ and the demand $\omega $ takes values in the support $\varOmega \subseteq {\mathbb {Z}}_+$, follows the recursion:

$$\begin{aligned} {\mathbb {E}} x^{t+1}&= \sum _{l} \pi _l^{t+1} l = \sum _{\omega \in \varOmega } [(S^t-\omega ) (\sum _{l,l< s^t} \pi ^t_l) + \sum _{l,l \ge s^t} (l -\omega ) \pi _l^t ] \phi _\omega \nonumber \\&= \sum _{\omega \in \varOmega } [ (S^t-\omega ) a^t + \sum _{l,l \ge s^t} (l -\omega ) \pi _l^t] \phi _\omega . \end{aligned}$$

(16)

From $\sum _{l,l\ge s^t} \pi ^t_l = 1 - a^t$, we have:

$$\begin{aligned} {\mathbb {E}} x^{t+1}&= \sum _{\omega \in \varOmega } [S^t a^t -\omega + \sum _{l,l \ge s^t} l \pi _l^t] \phi _\omega = \sum _{\omega \in \varOmega } [S^t a^t -\omega + \sum _{l} l \pi _l^t - \sum _{l,l< s^t} l \pi _l^t] \phi _\omega \nonumber \\&= \sum _{\omega \in \varOmega } [S^t a^t -\omega + {\mathbb {E}} x^{t} - \sum _{l,l< s^t} l \pi _l^t ] \phi _\omega = \sum _{\omega \in \varOmega } [S^t (\sum _{l,l< s^t} \pi _l^t ) -\omega + {\mathbb {E}} x^{t} - \sum _{l,l< s^t} l \pi _l^t ] \phi _\omega \nonumber \\&= \sum _{\omega \in \varOmega } [-\omega \phi _\omega ] + \sum _{l,l < s^t} (S^t - l) \pi _l^t + {\mathbb {E}} x^{t}. \end{aligned}$$

(17)

In the numerical example, we make use of (17) to obtain the first moment of the distribution of the inventory at time $t+1$.

4 Stationarity

In this section, we are interested in stationary solutions, namely solutions where both the distribution function and the value function do not depend on time.

Remark 4.1

If the distribution function and the value function do not depend on time, we have a stationary fraction of active players, namely

$$\begin{aligned} {{\tilde{a}}} = \sum _{l,l < s} \pi _l. \end{aligned}$$

In addition, the activation cost is a function of the fraction of active players. Therefore, the cost $K({\tilde{a}})$ is fixed over the horizon and it depends on the stationary solution. Now, we can apply the results obtained in Sect. 3 for a fixed activation cost K, to obtain the optimal lower threshold s and the optimal upper threshold S.

Let us denote by $(\pi ,v)$ the generic stationary solution. The pair $(\pi ,v)$ is a mean-field equilibrium at steady state if it satisfies the following set of equations:

$$\begin{aligned} \left\{ \begin{array}{ll} v_l = \sum _{j} c_{lj}(\pi ,P) P_{lj} + v_j P_{lj} - {{\bar{\lambda }}}, \\ \pi _j = \sum _{l} \pi _l P_{lj}, \end{array} \right. \end{aligned}$$

(18)

where ${{\bar{\lambda }}}$ is the optimal average cost per stage. In [9], the authors prove that the optimal average cost can be seen as an average transition cost over the population of players. If ${\bar{P}}$ is the optimal transition matrix and $({\bar{\pi }},{\bar{v}})$ is a stationary solution of (18), then ${\bar{\lambda }}=\sum _{lj}\pi _jc_{lj}({\bar{\pi }},{\bar{P}}){\bar{P}}_{lj}$.

Assuming a bounded support for the demand $\omega $ and therefore also for the inventory level x, which we denote by $[1,\eta ]$, let us define matrix ${{\tilde{A}}}=[{\tilde{a}}_{ij}]_{i,j\in [1,\eta ]}$, where:

$$\begin{aligned} {\tilde{a}}_{ij}=\left\{ \begin{array}{ll} -P_{0i} - \sum _{k,k\not = i} P_{ik}, &{} \hbox { if}\ j=i,\\ -P_{0j} + P_{ij}, &{} \hbox { if}\ j\not =i. \end{array} \right. \end{aligned}$$

(19)

Let us define the new variable $\xi ^t_{lk}=[v^t_l-v^t_k]$, which can be seen as a potential difference between two generic states or nodes of the Markov chain l and k, and the vector $\xi ^t_l:=[\xi ^t_{lj}]_{j\in {\mathbb {Z}}}=[v^t_l-v^t_j]_{j\in {\mathbb {Z}}}$. In particular, $\xi ^t_0:=[\xi ^t_{0j}]_{j\in {\mathbb {Z}}}=[v^t_0-v^t_j]_{j\in {\mathbb {Z}}}$. In addition, denote $P^t_l= [P^t_{lj}]_{j \in {\mathbb {Z}}}$ and $c_l= [c_{lj}]_{j \in {\mathbb {Z}}}$ for all $l \in {\mathbb {Z}}$.

Before discussing the main contribution of this section, that is the convergence of nonstationary mean-field equilibrium to the stationary one in the limit, we present an intermediate result to verify the structure of ${\tilde{A}}$ introduced in (19).

Lemma 4.1

Let a bounded support for the demand $\omega $ and for the inventory level x be given and denote it by $[1,\eta ]$. The discrete-time dynamics of the potential difference $\xi ^t_0=[v^t_0-v^t_j]_{j\in [1,\eta ]}$ is given by:

$$\begin{aligned} {\dot{\xi }}^t_0={\tilde{A}}\xi ^t_0+{\tilde{b}}, \end{aligned}$$

(20)

where ${\tilde{A}}=[{\tilde{a}}^t_{ij}]_{i,j\in [1,\eta ]}$, each entry ${\tilde{a}}_{ij}^t$ is of the form (19) and ${\tilde{b}}=[c_0^TP^t_0-c_j^TP^t_j]_{j\in [1,\eta ]}$.

Proof

The proof is in the Appendix. $\square $

In the following theorem, we present the conditions for the nonstationary mean-field equilibrium, which is a solution of (8), to converge to the stationary solution of problem (18). Note that the stochastic matrix $P^t$ presented in equation (8) is a Nash minimizer of the average cost $e(\pi ^t,\cdot ,v^t)$.

Let $\pi [N](-N)$ be the initial distribution of players at the beginning of the horizon at time $-N$ and $v[N](N)_l$ the terminal cost at the end of the horizon at time N.

Theorem 4.1

Given $N>0$, a vector $\pi ^0 \in {\mathbb {Z}}$ and a terminal penalty $v^N_l\in {\mathbb {R}}_+$, let $(\pi [N], v[N])$ be the solution of (8) with initial-terminal conditions $\pi [N](-N) = \pi ^0$ and $v[N](N)_l = v^N_l$. Let $({\bar{\pi }},{\bar{v}})$ be a solution of the stationary problem (18). When $N \rightarrow \infty $

$$\begin{aligned} \pi [N]^0 \rightarrow {\bar{\pi }}, \qquad v[N]^0 \rightarrow {\bar{v}}, \end{aligned}$$

(21)

if $det({{\tilde{A}}})>0$.

Proof

The proof is in the Appendix. $\square $

5 Numerical Analysis

We consider an example where the demand $\omega ^t \in \varOmega := \{0,1,2,3\}$ and it is uniformly distributed, namely by using the notation $\phi _\omega $ to indicate the probability that $\omega ^t = \omega $, we have $\phi _i=\frac{1}{4}$ for all $i \in \varOmega $.

Assume that the proportional purchase cost is $r=1$, the shortage cost is $p=10$, and the holding cost is $h=2$. In the case of single-stage optimization, we have that the order-up-to level is given by:

$$\begin{aligned} S = \arg \min _\gamma \Big \{\gamma | \, \varPhi ^t_\omega [\gamma ] \ge \frac{-c + p }{h +p} \Big \}. \end{aligned}$$

From the above, we obtain $S=2$. Indeed for $\gamma =3$, we have:

$$\begin{aligned} \varPhi ^t_\omega [3] =1 \ge \frac{-r + p }{h +p} = \frac{3}{4}. \end{aligned}$$

For $\gamma =2$, we obtain:

$$\begin{aligned} \varPhi ^t_\omega [2] = \frac{3}{4} = \frac{-r + p }{h +p} = \frac{3}{4}, \end{aligned}$$

Differently, for $\gamma =1$ it holds

$$\begin{aligned} \varPhi ^t_\omega [1] = \frac{1}{2} \not \ge \frac{-r + p }{h +p} = \frac{3}{4}, \end{aligned}$$

and therefore

$$\begin{aligned} S=\arg \min _\gamma \Big \{\gamma | \, \varPhi ^t_\omega [\gamma ] \ge \frac{-r + p }{h +p} \Big \} =2. \end{aligned}$$

As for the reorder level s, we have:

$$\begin{aligned} \begin{array}{ll} s:= \arg \min _{x} \Big \{x | \, h \varPsi _h^t[x] + p \varPsi _s^t[x] \le K^t + r ( S - x ) + h \varPsi _h^t[S] + p \varPsi _s^t[S] \Big \}. \end{array}\nonumber \\ \end{aligned}$$

(22)

We show next that we have $s=1$.

Actually, for $x^t=1$ we obtain:

$$\begin{aligned} \begin{array}{ll} h \varPsi _h^t[1] + p \varPsi _s^t[1] = h \frac{1}{4} + p \frac{3}{4} = 8 &{}\le K^t + r + h \varPsi _h^t[2] + p \varPsi _s^t[2] \\ {} &{}= K^t + 1 + h \frac{3}{4} + p\frac{1}{4} =K^t + 5, \end{array}\end{aligned}$$

(23)

which is satisfied by any $K^t \ge 3$.

For $x^t=0$, we have:

$$\begin{aligned} \begin{array}{ll} h \varPsi _h^t[0] + p \varPsi _s^t[0] = p\frac{6}{4}= 15 &{}\le K^t + 2 r + h \varPsi _h^t[2]+ p \varPsi _s^t[2] \\ {} &{}= K^t + 2 + h \frac{3}{4} + p\frac{1}{4} =K^t + 6, \end{array}\end{aligned}$$

(24)

which is satisfied by any $K^t \ge 9$.

For any $K^t < 9$, we then have:

$$\begin{aligned} \begin{array}{ll} s:= \arg \min _{x} \Big \{x | \, h \varPsi _h^t[x] + p \varPsi _s^t[x] \le K^t + r ( S - x ) + h \varPsi _h^t[S]+ p \varPsi _s^t[S] \Big \}=1.\end{array}\end{aligned}$$

We can conclude then that for any $K^t$, such that $1 \le K^t < 9$, we have the reorder level $s=1$ and the order-up-to level $S=2$.

Then, from (3) the microscopic dynamics is defined in the bounded support $\{-2,-1,0,1,2\}$, namely $x^t \in \{-2,-1,0,1,2\}$ for all $t\ge 0$ and is given by:

$$\begin{aligned} x^{t+1}=\left\{ \begin{array}{cc} 2-\omega ^t, &{} \qquad \text{ if } \quad x^t =-2,-1,0,\\ x^t-\omega ^t, &{} \qquad \text{ if } \quad x^t = 1,2. \end{array}\right. \end{aligned}$$

(25)

The macroscopic dynamics corresponding to the microscopic dynamics (25) is the Markov chain displayed in Fig. 4.

As for the value function difference we have a $4 \times 4$ system where $l \in \{-2,-1,0,1,2\}$, which is given by:

$$\begin{aligned} \left[ \begin{array}{ll} {{\dot{\xi }}}_{-2-1} \\ {{\dot{\xi }}}_{-20}\\ {{\dot{\xi }}}_{-21}\\ {{\dot{\xi }}}_{-22} \end{array} \right] = \left[ \begin{array}{cccc} -1 &{} 0 &{} 0 &{} 0 \\ 0 &{} -1 &{} 0 &{} 0 \\ 0 &{} 0 &{} -1 &{} 0 \\ 0 &{} 0 &{} 0 &{} -1 \end{array} \right] \left[ \begin{array}{ll} \xi _{-2-1}\\ \xi _{-20} \\ \xi _{-21} \\ \xi _{-22} \end{array} \right] +\left[ \begin{array}{ll} \frac{1}{4}(\sum _{j=1}^{2}c_{-2j}-\sum _{j=1}^{2}c_{-1j}) \\ \frac{1}{4}(\sum _{j=1}^{2}c_{-2j}-\sum _{j=1}^{2}c_{0j}) \\ \frac{1}{4}(\sum _{j=1}^{2}c_{-2j}-\sum _{j=1}^{2}c_{1j}) \\ \frac{1}{4}(\sum _{j=1}^{2}c_{-2j}-\sum _{j=1}^{2}c_{2j}) \end{array} \right] .\nonumber \\ \end{aligned}$$

(26)

From (26), we note that the $det({{\tilde{A}}})=1>0$. From (17), we also have that the dynamics of the expected inventory (first moment) is given by:

$$\begin{aligned} \begin{array}{ll} {\mathbb {E}} x^{t+1} &{} = -2\pi _{-2}^{t+1} - \pi _{-1}^{t+1} + \pi _{1}^{t+1} + 2 \pi _2^{t+1} \\ &{} = \sum _{\omega \in \varOmega } [(2-\omega )(\pi _{-2}^{t} + \pi _{-1}^{t} + \pi _{0}^{t})+(1-\omega )\pi _1^{t}+(2-\omega )\pi _2^{t}]\phi _\omega \\ &{} = \sum _{\omega \in \varOmega } [2(\pi _{-2}^{t} + \pi _{-1}^{t} + \pi _{0}^{t}) + \pi _1^t + 2\pi _2^t - \omega (\pi _{-2}^{t} + \pi _{-1}^{t} + \pi _{0}^{t} + \pi _1^t + \pi _2^t)]\phi _\omega \\ &{} = \sum _{\omega \in \varOmega }[-2\pi _{-2}^t-\pi _{-1}^t+\pi _1^t+2\pi _2^t+4\pi _{-2}^t+3\pi _{-1}^t+2\pi _0^t-\omega ]\phi _\omega \\ &{} = \sum _{\omega \in \varOmega } (-\omega \phi _{\omega })+\sum _{l,l<1}(2-l)\pi _l^t+{\mathbb {E}}x^t. \end{array}\nonumber \\ \end{aligned}$$

(27)

The rest of this section involves numerical analysis for a system of 100 indistinguishable players. All simulations are carried out with MATLAB on an Intel(R) Core(TM)2 Duo, CPU P8400 at 2.27 GHz, and a 3GB of RAM. The horizon window consists of $T=200$ iterations. For each player, we simulate (25) for three cases characterized by a different initial distribution.

The initial state is obtained from a random uniform distribution in $\{1,2\}$ for case 1, in $\{-2,0\}$ for case 2, and in $\{-2,2\}$ for case 3 using the commands x0=randi([1,2],n,1), x0=randi([-2,0],n,1), and x0=randi([-2,2],n,1), respectively. The demand is obtained in accordance with $\phi _i$ and is generated using the command w=randi([0,3],n,T).

The step size is $dt=0.1$, the proportional purchase cost is $r=1$, the shortage cost is $p=10$, and the holding cost is $h=2$.

Figure 5 displays the time plot of the distribution $\pi ^t$ for all $t\in [0,T]$ for the three cases. The distribution at steady state is greater in state $-1$, 0, and 1 (red, yellow, and purple lines, respectively). Note that, in accordance with Theorem 4.1, the three cases with different initial distribution have the same distribution at steady state. During the simulation, we assume any 50 iterations the states are reset to their initial value, to investigate the time response during the transients.

Figure 6 displays the time plot of the microscopic dynamics for a single player. In other words, the plot shows the inventory level (the state) of a player. Observe that, according to (25), the inventory level of the individual player takes its values in the bounded support $\{-2,-1,0,1,2\}$, where the lower threshold is $s=1$ and the upper threshold is $S=2$. The player’s inventory is for most of the time in state 0 and 1, which is in accordance with the greater values of the distribution in those states obtained from the macroscopic dynamics in the previous figure. Therefore, we can observe a clear connection between the macroscopic dynamics (Fig. 5) and the microscopic dynamics for a single player (Fig. 6).

In the next example, we analyze the same system with 100 indistinguishable players. The purchase, shortage, and holding costs are as in the previous example, and we consider a transportation cost $K = 1200$, which will be divided among the active players at each time t. The horizon window consists again of $T = 200$ iterations. However, in this case we increase the demand set such that $w^t \in \varOmega := \{0,1,...,10\}$ and is uniformly distributed. The macroscopic dynamics is represented by the Markov chain displayed in Fig. 7.

In Fig. 8, it is represented the time plot of the macroscopic dynamics for one player. In accordance with (13) and (15), it is possible to see that the players reorder when their inventory level is lower than or equal to the threshold s, which also depends on the number of active players, and they reorder up to the upper threshold $S = 8$.

Figure 9 illustrates the time plot of the distribution $\pi ^t$ for three different initial states. The simulations were developed for three cases in which the initial states are obtained from a random uniform distribution in $\{0,1,...,8\}$ for case 1, in $\{-10,-9,...,-1,0\}$ for case 2, and in $\{-10,-9,...,-1,0,1,...,8\}$ for case 3. The states i displayed are $i = -8$ (blue), $i = -1$ (yellow), $i = 1$ (purple), and $i = 8$ (red). Note that, in accordance with Theorem 4.1, the four cases with different initial distribution have the same distribution at steady state. One can also see that the distribution at steady state is greater in state -1 and 1, which is consistent with Fig. 8. In Fig. 8 indeed, the inventory is for most of the time in states closer to state 0. In the same way as in the previous example, we can observe a clear connection between the macroscopic dynamics (Fig. 9) and the microscopic dynamics for a single player (Fig. 8). During this simulation, we assume any 50 iterations the states are reset to their initial value.

6 Conclusions

We have developed an abstraction in the form of a dynamic coordination game model where each player’s dynamics is a scalar fluid flow dynamical system characterized by a controlled input flow and an uncontrolled output flow. The players have to pay a share of the activation cost to control their dynamics at a given time. We have provided three main contributions. First, we have showed that if the retailers are rational players, then they benefit from using threshold strategies where the threshold is on the fraction of active players. Then, we have obtained explicit expressions for the lower and upper thresholds under specific circumstances. Third, we have extended our study to a scenario with a large number of players and we have proved that two-threshold strategies, such as the (s, S) strategies used in inventory control, are optimal strategies for the stationary solution. In this context, we have also provided conditions for the nonstationary mean-field equilibrium to converge to the stationary one in the limit.

A main key direction for future works is to explore the feasibility of the proposed coordination scheme in multi-vector energy systems (heat, gas, power) with special focus on coalitional bidding in decentralized energy trade. The ultimate goal is to investigate the benefits of aggregating independent wind power producers.

References

Adlakha, S., Johari, R.: Mean field equilibrium in dynamic games with strategic complementarities. Oper. Res. 61(4), 971–989 (2013)
Article MathSciNet MATH Google Scholar
Bauso, D., Giarrè, L., Pesenti, R.: Consensus in noncooperative dynamic games: a multi-retailer inventory application. IEEE Trans. Autom. Control 53(4), 998–1003 (2008)
Article MATH Google Scholar
Bauso, D., Giarrè, L., Pesenti, R.: Distributed consensus in noncooperative inventory games. Eur. J. Oper. Res. 192(3), 866–878 (2009)
Article MathSciNet MATH Google Scholar
Bauso, D., Zhu, Q., Başar, T.: Decomposition and mean-field approach to mixed integer optimal compensation problems. J. Optim. Theory Appl. 169, 606–630 (2016)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 2nd edn. Athena, Bellmont, MA (1995)
MATH Google Scholar
Cabrales, A., Gottardi, P., Vega-Redondo, F.: Risk sharing and contagion in networks. Rev. Financ. Stud. 30(9), 3086–3127 (2017)
Article Google Scholar
Clark, A., Scarf, S.: Optimal Policies for a multi-echelon inventory problem. Manage. Sci. 6(4), 475–490 (1960)
Article Google Scholar
Elliot, M., Golub, B., Jackson, M.O.: Financial networks and contagion. Am. Econom. Rev. 104(10), 3115–3153 (2014)
Article Google Scholar
Gomes, D.A., Mohr, J., Rigão Souza, R.: Discrete time, finite state space mean field games. J. Mathématiques Pures et Appliquées 93(3), 308–328 (2010)
Article MathSciNet MATH Google Scholar
Gomes, D.A., Saúde, J.: Mean field games models - a brief survey. Dyn. Games Appl. 4(2), 110–154 (2014)
Article MathSciNet MATH Google Scholar
González-Avella, J.C., Eguíluz, V.M., Marsili, M., Vega-Redondo, F., San Miguel, M.: Threshold learning dynamics in social networks. PLoS ONE 6(5), e20207 (2011)
Article Google Scholar
Granovetter, M.: Threshold models of collective behavior. Am. J. Sociol. 83(6), 1420–1443 (1978)
Article Google Scholar
Huang, M.Y., Caines, P.E., Malhamé, R.P.: Large population stochastic dynamic games: closed loop Kean-Vlasov systems and the nash certainty equivalence principle. Commun. Inf. Syst. 6(3), 221–252 (2006)
Article MathSciNet MATH Google Scholar
Huang, M.Y., Caines, P.E., Malhamé, R.P.: Large population cost-coupled LQG problems with non-uniform agents: individual-mass behaviour and decentralized $\epsilon $-Nash equilibria. IEEE Trans. Autom. Control 52(9), 1560–1571 (2007)
Article MATH Google Scholar
Jacksons, M.O.: Social and Economic Networks. Princeton University Press, USA (2010)
Book Google Scholar
Lasry, J.M., Lions, P.L.: Mean field games. Japan. J. Math. 2, 229–260 (2007)
Article MathSciNet MATH Google Scholar
Pesenti, R., Bauso, D.: Mean field linear quadratic games with set up costs. Dyn. Games Appl. 3(1), 89–104 (2013)
Article MathSciNet MATH Google Scholar
Rossi, W., Como, G., Fagnani, F.: Threshold Models of cascades in large-scale networks. IEEE Trans Net Sci. Eng. 6(2), 158–172 (2019)
Article MathSciNet Google Scholar
Ramirez, S., Bauso, D.: Dynamic coordination games with activation costs. Dynamic Games Appl. 11, 580–596 (2021)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

SMiLES Research Project, part of the Research Programme Sustainable Living Labs Dutch Research Council (NWO) 10.13039/501100016993 - Ministry of Infrastructure and Water Management, Taskforce for Applied Research (SIA) Top Sector Logistics.

Author information

Authors and Affiliations

Jan C. Willems Center for Systems and Control, Engineering and Technology Institute Groningen, University of Groningen, Nijenborgh 4, 9747, AG, Groningen, The Netherlands
Stefanny Ramirez & Dario Bauso
Dipartimento di Ingegneria, Università di Palermo, 90128, Palermo, Italy
Dario Bauso

Authors

Stefanny Ramirez
View author publications
You can also search for this author in PubMed Google Scholar
Dario Bauso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefanny Ramirez.

Additional information

Communicated by Dusan Stipanovic.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this section, we provide the proof of the main results, Lemma 4.1 and Theorem 4.1, presented in Sect. 4.

Proof of Lemma 4.1

Let us rewrite (8) explicitly for $l\in [0,\eta ]$ as:

$$\begin{aligned} \begin{array}{ll} v_0^t = (c_{00}(\cdot ) + v_0^{t+1}) P_{00}^t + (c_{01}(\cdot ) + v_1^{t+1}) P_{01}^t + \ldots +(c_{0\eta }(\cdot ) + v_{\eta }^{t+1}) P_{0\eta }^t,\\ \vdots \\ v_\eta ^t = (c_{\eta 0}(\cdot ) + v_0^{t+1}) P_{\eta 0}^t + (c_{\eta 1}(\cdot ) + v_1^{t+1}) P_{\eta 1}^t \ldots + (c_{\eta \eta }(\cdot ) + v_\eta ^{t+1}) P_{\eta \eta }^t. \end{array} \end{aligned}$$

(28)

By subtracting the same quantity from the LHS and RHS, we obtain the following set of difference equations, when $l\in [0,\eta ]$:

$$\begin{aligned} \begin{array}{ll} \dot{v}_0^t = v_0^t - v_0^{t+1} \\ \quad = (c_{00}(\cdot ) + v_0^{t+1}) P_{00}^t + (c_{01}(\cdot ) + v_1^{t+1}) P_{01}^t \ldots +(c_{0\eta }(\cdot ) + v_{\eta }^{t+1}) P_{0\eta }^t- v_0^{t+1}\\ \quad = c_{00}(\cdot ) P_{00}^t + c_{01}(\cdot ) P_{01}^t + \ldots +c_{0\eta }(\cdot ) P_{0\eta }^t + (v_0^{t+1} - v_0^{t+1}) P_{00}^t \\ \qquad + (v_1^{t+1} - v_0^{t+1}) P_{01}^t + \ldots + (v_\eta ^{t+1}-v_0^{t+1}) P_{0\eta }^t,\\ \vdots \\ \dot{v}_{\eta }^t = v_{\eta }^t - v_{\eta }^{t+1} \\ \quad = (c_{\eta 0}(\cdot ) + v_0^{t+1}) P_{\eta 0}^t + (c_{\eta 1}(\cdot ) + v_1^{t+1}) P_{\eta 1}^t + \ldots + (c_{\eta \eta }(\cdot ) + v_\eta ^{t+1}) P_{\eta \eta }^t- v_\eta ^{t+1}\\ \quad = c_{\eta 0}(\cdot ) P_{\eta 0}^t + c_{\eta 1}(\cdot ) P_{\eta 1}^t + \ldots + c_{\eta \eta }(\cdot ) P_{\eta \eta }^t + (v_0^{t+1} - v_\eta ^{t+1}) P_{\eta 0}^t \\ \qquad + ( v_1^{t+1} - v_\eta ^{t+1}) P_{\eta 1}^t + \ldots + (v_\eta ^{t+1}-v_\eta ^{t+1}) P_{\eta \eta }^t.\\ \end{array} \end{aligned}$$

In compact form, for $l=0,1,\ldots ,\eta $, we have:

$$\begin{aligned} \begin{array}{lll} \dot{v}_l^t = \sum _{j\in [0,\eta ]} [ c_{lj}(\pi ^t,P^t) P_{lj}^t ] + [v_0^{t}-v_l^{t}\quad v_1^{t}-v_l^{t} \ldots v_{\eta }^{t}-v_l^{t}] \left[ P_{l0}^t \quad P_{l1}^t \ldots P_{l\eta }^t \right] ^T. \end{array}\nonumber \\ \end{aligned}$$

(29)

From $\xi ^t_{0k} = \dot{v}^t_0 - \dot{v}^t_k$, we then have:

$$\begin{aligned} \begin{array}{l} {{\dot{\xi }}}^t_{0k} = \dot{v}^t_0 - \dot{v}^t_k \\ \qquad = \sum _{j\in [0,\eta ]} [ c_{0j}(\pi ^t,P^t) P_{0j}^t ] + [v^{t+1}_j-v^{t+1}_0]^T_{j\in {\mathbb {Z}}} P_0^t - \sum _{j\in [0,\eta ]} [ c_{kj}(\pi ^t,P^t) P_{kj}^t ] \\ \qquad \quad - [v^{t+1}_j-v^{t+1}_k]^T_{j\in {\mathbb {Z}}} P_k^t. \end{array} \end{aligned}$$

In matrix form, we have:

$$\begin{aligned} \begin{array}{ll} {{\dot{\xi }}}^t_{0k} = [c_{00}(\cdot ) \quad \ldots \quad c_{0\eta }(\cdot )] \left[ P_{00}^t \ldots P_{0\eta }^t \right] ^T - [c_{k0}(\cdot ) \quad \ldots \quad c_{k\eta }(\cdot )] \left[ P_{k0}^t \ldots P_{k\eta }^t \right] ^T \\ \quad \qquad + [\underbrace{v_0^{t+1} - v_0^{t+1}}_{\xi ^t_{00}} \quad \ldots \quad \underbrace{v_{\eta }^{t+1}-v_0^{t+1}}_{\xi ^t_{\eta 0}}] \left[ P_{00}^t \ldots P_{0\eta }^t \right] ^T\\ \quad \qquad - [\underbrace{v_0^{t+1} - v_k^{t+1}}_{\xi ^t_{0k}} \quad \ldots \quad \underbrace{v_\eta ^{t+1}-v_k^{t+1}}_{\xi ^t_{\eta k}}] \left[ P_{k0}^t \ldots P_{k\eta }^t \right] ^T\\ \end{array} \end{aligned}$$

We know that $\xi ^t_{lj}=-\xi ^t_{jl}=-\xi ^t_{0l}+\xi ^t_{0j}$. Hence, we obtain:

$$\begin{aligned} {{\dot{\xi }}}^t_{0k} =&\xi ^t_{01}(-P_{01}^t + P_{k1}^t) + \xi ^t_{02}(-P_{02}^t + P_{k2}^t)+ \ldots + \xi ^t_{0k}(-P_{0k}^t-\sum _{j,j\ne k}P_{kj}^t) \\&+ \ldots + \xi ^t_{0\eta }(-P_{0\eta }^t + P_{k\eta }^t)+\sum _{j\in [0,\eta ]} [ c_{0j}(\pi ^t,P^t) P_{0j}^t ] - \sum _{j\in [0,\eta ]} [ c_{kj}(\pi ^t,P^t) P_{kj}^t ], \end{aligned}$$

from which we obtain (20). $\square $

Proof of Theorem 4.1

From (8), let us subtract $v_l^{t+1}$ from the LHS and RHS and obtain for all $l \in {\mathbb {Z}}$:

$$\begin{aligned} v_l^t - v_l^{t+1}&= \sum _{j \in {\mathbb {Z}}} ( c_{lj}(\pi ^t,P^t) P_{lj}^t + v_j^{t+1} P_{lj}^t ) - v_l^{t+1} \nonumber \\&= \sum _{j \in {\mathbb {Z}}} [ c_{lj}(\pi ^t,P^t) P_{lj}^t + (v_j^{t+1}- v_l^{t+1}) P_{lj}^t ]. \end{aligned}$$

(30)

In the second equality above, we use the condition $\sum _{j \in {\mathbb {Z}}} P_{lj}^t =1$ which implies $\sum _{j \in {\mathbb {Z}}} P_{lj}^t v_l^{t+1}=v_l^{t+1}$. Let us denote the derivative in discrete time by the scalar quantity $\dot{v}_l^t = v_l^t - v_l^{t+1}$ for all $l \in {\mathbb {Z}}$. Using the variable $\xi ^t_{lk} = v^t_l - v^t_k$, which represents the potential difference between two generic states, then for all $l,k \in {\mathbb {Z}}$, we have:

$$\begin{aligned} {{\dot{\xi }}}^t_{lk}&= \dot{v}^t_l - \dot{v}^t_k = \sum _{j \in {\mathbb {Z}}} \Big [ (c_{lj}(\cdot ) P_{lj}^t - c_{kj}(\cdot ) P_{kj}^t ) - \xi ^t_{lj} P_{lj}^t + \xi ^t_{kj} P_{kj}^t \Big ]. \end{aligned}$$

(31)

We are interested in finding equilibrium points where the potential difference between the value function of any pair of states is constant. When the potential difference is constant, we have a stationary solution for (18). The equilibrium points of the above dynamics can be obtained by setting (31) equal to zero, which yields:

$$\begin{aligned} 0= \sum _{j} \Big [ (c_{lj}(\cdot ) P_{lj}^t - c_{kj}(\cdot ) P_{kj}^t ) - \xi ^t_{lj} P_{lj}^t + \xi ^t_{kj} P_{kj}^t \Big ], \forall l,k \in {\mathbb {Z}}\text{. } \end{aligned}$$

Using the notation $P^t_l= [P^t_{lj}]_{j \in {\mathbb {Z}}}$ and $c_l= [c_{lj}]_{j \in {\mathbb {Z}}}$ for all $l \in {\mathbb {Z}}$, the equilibrium condition can be rewritten as:

$$\begin{aligned} \begin{array}{lll} {{\dot{\xi }}}^t_{lk} = c_l^T P_l^t - c_k^T P_k^t - P_l^{t^T} \xi ^t_l + P_k^{t^T} \xi ^t_k=0, \end{array} \end{aligned}$$

(32)

where $\xi ^t_l: = [\xi ^t_{lj}]_{j \in {\mathbb {Z}}}= [v^t_l - v^t_j]_{j \in {\mathbb {Z}}}$. In matrix form, we then have:

$$\begin{aligned} \begin{array}{lll} {{\dot{\xi }}}^t_l:= \left[ \begin{array}{ccc} \vdots \\ {{\dot{\xi }}}^t_{lk} \\ \vdots \end{array} \right] = \underbrace{\left[ \begin{array}{ccccc} \ddots &{} \vdots &{} \ddots &{} \vdots &{} \ddots \\ \ldots &{} - P_l^{t^T} &{} \ldots &{} P_k^{t^T} &{} \ldots \\ \ddots &{} \vdots &{} \ddots &{} \vdots &{} \ddots \end{array} \right] }_{A} \left[ \begin{array}{ccccc} \vdots \\ \xi ^t_l\\ \vdots \\ \xi ^t_k\\ \vdots \end{array} \right] +\underbrace{ \left[ \begin{array}{ccc} \vdots \\ c_l^T P^t_l - c_k^T P^t_k \\ \vdots \end{array} \right] }_{b}=0. \end{array} \end{aligned}$$

(33)

By using the condition $\xi ^t_{lj} = -\xi ^t_{jl}$ and $\xi ^t_{lj}=v^t_l - v^t_j=v^t_l - v^t_0 -v^t_j + v^t_0= -\xi ^t_{0\,l} +\xi ^t_{0j}$, we can express the above set of equations in the variables $\xi ^t_{0l}$ for all $l \in {\mathbb {Z}}$. Setting $\xi ^t_0: = [\xi ^t_{0j}]_{j \in {\mathbb {Z}}}= [v^t_0 - v^t_j]_{j \in {\mathbb {Z}}}$, we have:

$$\begin{aligned} {{\dot{\xi }}}^t_0 = {{\tilde{A}}} \xi ^t_0 + {{\tilde{b}}}, \end{aligned}$$

(34)

where the matrix ${{\tilde{A}}}$ and vector ${{\tilde{b}}}$ can be derived from A and vector b. Assuming a bounded support for $\omega $ and therefore also for x, denoted by $[1,\eta ]$, we obtain a generic $\eta \times \eta $ dynamical system where ${{\tilde{A}}}=[\tilde{a}^t_{ij}]_{i,j\in [1,\eta ]}$, and from which the following equilibrium point can be obtained:

$$\begin{aligned} \xi ^* = - {{\tilde{A}}}^{-1} {{\tilde{b}}} \ge 0. \end{aligned}$$

In Lemma 4.1, we illustrate a constructive way to obtain ${{\tilde{A}}}$. Hence, for the bounded support $[1,\eta ]$, system (34) can be represented in matrix form as:

$$\begin{aligned} \left[ \begin{array}{ll} {{\dot{\xi }}}^t_{01} \\ {{\dot{\xi }}}^t_{02} \\ \vdots \\ {{\dot{\xi }}}^t_{0\eta } \end{array} \right] =&\left[ \begin{array}{cccc} -P_{01}^t - \displaystyle \sum _{k,k\not = 1}P_{1k}^t &{} -P_{02}^t + P_{12}^t &{} \ldots &{} -P_{0\eta }^t + P_{1 \eta }^t\\ -P_{01}^t + P_{21}^t &{} -P_{02}^t- \displaystyle \sum _{k,k\not = 2}P_{2k}^t &{} \ldots &{} -P_{0\eta }^t + P_{2 \eta }^t\\ \vdots &{} \vdots &{} \ddots &{}\vdots \\ -P_{01}^t + P_{\eta 1}^t &{} -P_{02}^t+P_{\eta 2}^t &{} \ldots &{} -P_{0\eta }^t- \displaystyle \sum _{k,k\not = \eta }P_{\eta k}^t \end{array} \right] \left[ \begin{array}{ll} \xi ^t_{01} \\ \xi ^t_{02} \\ \vdots \\ \xi ^t_{0\eta } \end{array} \right] \nonumber \\&+\left[ \begin{array}{ccc} c_0^T P^t_0 - c_1^T P^t_1\\ c_0^T P^t_0 - c_2^T P^t_2\\ \vdots \\ c_0^T P^t_{0} - c_{\eta }^T P^t_{\eta } \\ \end{array}\right] . \end{aligned}$$

(35)

It is evident that the entries of the main diagonal of the matrix follow the law, for generic $l \in \{0,1, \ldots , \eta \}$:

$$\begin{aligned} \begin{array}{ll} {{\tilde{a}}}^t_{ll} = -P^t_{0l} - \sum _{k,k\not = i} P^t_{lk}, \\ {{\tilde{a}}}^t_{lj}= -P^t_{0j} - P^t_{lj}, \end{array} \end{aligned}$$

(36)

which are in accordance with (19). Now, note that the trace of ${{\tilde{A}}}$ is negative, namely

$$\begin{aligned} Tr({{\tilde{A}}})= \sum _{l \in \{0,\ldots ,\mu \}}{{\tilde{a}}}^t_{ll} = \sum _{l \in \{0,\ldots ,\mu \}} [-P^t_{0l} - \sum _{k,k\not = i} P^t_{lk}] <0. \end{aligned}$$

If the determinant of matrix ${\tilde{A}}$ is positive, then the time response of the dynamical system (34) is characterized by eigenvalues with negative real part, and the system is asymptotically stable. Therefore, we can conclude that the initial conditions $(\pi [N]^0,v[N]^0)$ converge to the equilibrium point $({\bar{\pi }},{\bar{v}})$:

$$\begin{aligned} \lim _{N\rightarrow \infty }\pi [N]^0={\bar{\pi }}, \qquad \lim _{N\rightarrow \infty }v[N]^0={\bar{v}}. \end{aligned}$$

$\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ramirez, S., Bauso, D. Dynamic Games with Strategic Complements and Large Number of Players. J Optim Theory Appl 197, 1–21 (2023). https://doi.org/10.1007/s10957-023-02174-8

Download citation

Received: 13 October 2020
Accepted: 28 January 2023
Published: 12 March 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10957-023-02174-8

Dynamic Games with Strategic Complements and Large Number of Players

Abstract

Similar content being viewed by others

Dynamic Coordination Games with Activation Costs

Zero-sum stochastic games with the average-value-at-risk criterion

Zero-Sum Average Cost Semi-Markov Games with Weakly Continuous Transition Probabilities and a Minimax Semi-Markov Inventory Problem

1 Introduction

2 Mean-Field Inventory Game

Definition 2.1

3 Optimal Thresholds

4 Stationarity

Remark 4.1

Lemma 4.1

Proof

Theorem 4.1

Proof

5 Numerical Analysis

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Lemma 4.1

Proof of Theorem 4.1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic Games with Strategic Complements and Large Number of Players

Abstract

Similar content being viewed by others

Dynamic Coordination Games with Activation Costs

Zero-sum stochastic games with the average-value-at-risk criterion

Zero-Sum Average Cost Semi-Markov Games with Weakly Continuous Transition Probabilities and a Minimax Semi-Markov Inventory Problem

1 Introduction

2 Mean-Field Inventory Game

Definition 2.1

3 Optimal Thresholds

4 Stationarity

Remark 4.1

Lemma 4.1

Proof

Theorem 4.1

Proof

5 Numerical Analysis

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Lemma 4.1

Proof of Theorem 4.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation