数学建模

模型知识点

优化模型-博弈论-重复博弈(en)

Wang Haihua

🚅 🚋😜 🚑 🚔

Repeated games¶

Definition of a repeated game¶

Given a two player game $(A,B)\in{\mathbb{R}^{m\times n}}^2$, referred to as a stage game, a $T$-stage repeated game is a game in which players play that stage game for $T > 0$ periods. Players make decisions based on the full history of play over all the periods.

For example consider the game:

$$ A = \begin{pmatrix} 0 & 6 & 1\\ 1 & 7 & 5 \end{pmatrix} \qquad B = \begin{pmatrix} 0 & 3 & 1\\ 1 & 0 & 1 \end{pmatrix} $$

by identifying the best responses:

$$ A = \begin{pmatrix} 0 & 6 & 1\\ \underline{1} & \underline{7} & \underline{5} \end{pmatrix} \qquad B = \begin{pmatrix} 0 & \underline{3} & 1\\ \underline{1} & 0 & \underline{1} \end{pmatrix} $$

it is immediate to find two Nash equilibria:

$$((0,1), (1,0,0))\qquad((0,1), (0,0,1))$$

In [1]:

import nashpy as nash
import numpy as np

A = np.array([[0, 6, 1],
              [1, 7, 5]])
B = np.array([[0, 3, 1],
              [1, 0, 1]])
game = nash.Game(A, B)
list(game.support_enumeration())

c:\users\agustin\appdata\local\programs\python\python38\lib\site-packages\nashpy\algorithms\support_enumeration.py:196: RuntimeWarning: 
An even number of (2) equilibria was returned. This
indicates that the game is degenerate. Consider using another algorithm
to investigate.
                  
  warnings.warn(warning, RuntimeWarning)

Out[1]:

[(array([0., 1.]), array([1., 0., 0.])),
 (array([0., 1.]), array([0., 0., 1.]))]

If we were to repeat this game twice ($T=2$) we obtain a new game. However to be able to think of this we need to define what a strategy in a repeated game is.

Definition of a strategy in a repeated game¶

Given a two player stage game $(A,B)\in{\mathbb{R}^{m\times n}}^2$, repeated to give a $T$-stage repeated game. A strategy is a mapping from the entire history of play to an action of the stage game:

$$ \bigcup_{t=0}^{T-1}H(t)\to a $$

where:

$H(t)$ is the history of play of both players up until stage $t$ ($H(0)=(\emptyset, \emptyset)$)
$a$ is an action (for either player) of the stage game

To help avoid confusion, whilst we have referred to pure strategies as choices made in stage games, here we will call those actions.

The actions for our example:

for the row player: $\{r_1, r_2\}$ (corresponding to the rows)
for the column player: $\{c_1, c_2, c_3\}$ (corresponding to the columns)

A strategy for the row/column player thus needs to map an element of the following set to an element of $\{r_1, r_2\}$/$\{c_1, c_2, c_3\}$:

$$\bigcup_{t=0}^{1}H(t)=\{(\emptyset, \emptyset), (r_1, c_1), (r_1, c_2), (r_1, c_3), (r_2, c_1), (r_2, c_2), (r_3, c_3)\}$$

In other words, in our example, a strategy answers both of the following questions:

What should the player do in the first period?
What should the player do in the second period given knowledge of what both players did in the first period?

The following theorem allows us to find a Nash equilibrium:

Theorem of sequence of stage Nash equilibria¶

Video

For any repeated game, any sequence of stage Nash profiles gives a Nash equilibrium.

Proof¶

Consider the following strategy:

The row/column player should play action $a_{r/c}$ regardless of the play of any previous strategy profiles.

where $(a_{r}, a_{c})$ is a given stage Nash equilibrium.

Using backwards induction, this is a Nash equilibrium for the last stage game. Thus, at the last stage, no player has a reason to deviate. Similarly at the $T-1$th stage. The proof follows.

Thus, for our example we have the four Nash equilibria:

$(r_2r_2, c_1c_1)$ with utility: (2, 2).
$(r_2r_2, c_1c_3)$ with utility: (6, 2).
$(r_2r_2, c_3c_1)$ with utility: (6, 2).
$(r_2r_2, c_3c_3)$ with utility: (10, 2).

Note however that it is not the only equilibria for our repeated game.

Reputation¶

Video

In a repeated game it is possible for players to encode reputation and trust in their strategies.

Consider the following two strategies:

For the row player:

$$(\emptyset, \emptyset) \to r_1$$ $$(r_1, c_1) \to r_2$$ $$(r_1, c_2) \to r_2$$ $$(r_1, c_3) \to r_2$$

For the column player:

$$(\emptyset, \emptyset) \to c_2$$ $$(r_1, c_2) \to c_3$$ $$(r_2, c_2) \to c_1$$

Note that here we omit some of the histories which are not possible based on the first play by each player.

This strategy corresponds to the following scenario:

Play $(r_1,c_2)$ in first stage and $(r_2,c_3)$ in second stage unless the row player does not cooperate in which case play $(r_2, c_1)$.

If both players play these strategies their utilities are: $(11, 4)$ which is better for both players then the utilities at any sequence of pure stage Nash equilibria. But is this a Nash equilibrium? To find out we investigate if either player has an incentive to deviate.

If the row player deviates, they would only be rational to do so in the first stage, if they did they would gain 1 in that stage but lose 4 in the second stage. Thus they have no incentive to deviate.
If the column player deviates, they would only do so in the first stage and gain no utility.

Thus this strategy pair is a Nash equilibrium and evidences how a reputation can be built and cooperation can emerge from complex dynamics.