Wang Haihua
🚅 🚋😜 🚑 🚔
One big area of repeated games is specifically the Prisoner's Dilemma. We have previously defined this as the following game:
$$ A = \begin{pmatrix} 3 & 0\\ 5 & 1 \end{pmatrix}\qquad B = \begin{pmatrix} 3 & 5\\ 0 & 1 \end{pmatrix} $$The general form is:
$$ A = \begin{pmatrix} R & S\\ T & P \end{pmatrix}\qquad B = \begin{pmatrix} R & T\\ S & P \end{pmatrix} $$with the following constraints:
$$T > R > P > S$$$$2R > T + S$$This game is a good model of agent (human, etc) interaction: a player can choose to take a slight loss of utility for the benefit of the other play and themselves.
As a single one shot game there is not much more to say about the Prisoner's dilemma. It becomes fascinating when studied as a repeated game.
In 1980, Robert Axelrod (a political scientist) invited submissions to a computer tournament version of an iterated prisoners dilemma. This was described in a 1980 paper titled "Effective Choice in the Prisoner's Dilemma".
The fact that Tit For Tat won garnered a lot of research (still ongoing) as it showed a mathematical model of how cooperative behaviour can emerge in complex situations (why are we nice to each other?).
There is a Python library (axelrod
) with over 200 strategies that can be used to reproduce this work. You can read the documentation for it here: http://axelrod.readthedocs.io.
In 1989 a particular family of strategies was introduced by Martin Nowak. These strategies are defined by two parameters: $(p_1, p_1)$ where:
Consider two reactive players:
$$ p=(p_1, p_2) \qquad q=(q_1, q_2) $$If we consider the order of possible states of a match to be:
$$S=\{CC, CD, DC, DD\}$$then we can summarise a game with the following matrix:
$$ M = \begin{pmatrix} p_1q_1 & p_1(1-q_1) & (1-p_1)q_1 & (1-p_1)(1-q_1) \\ p_2q_1 & p_2(1-q_1) & (1-p_2)q_1 & (1-p_2)(1-q_1) \\ p_1q_2 & p_1(1-q_2) & (1-p_1)q_2 & (1-p_1)(1-q_2) \\ p_2q_2 & p_2(1-q_2) & (1-p_2)q_2 & (1-p_2)(1-q_2) \\ \end{pmatrix} $$The matrix $M$ corresponds to a Markov chain. Given a probability vector $\pi$ representing the probability of being in a given state of $S$: the probabilities of being in the a given step in the next round are given by:
$$\pi M$$If we consider:
$$ p=(1 / 4, 4 / 5) \qquad q=(2 / 5, 1 / 3) $$below is some code that calculates the probabilities over 20 turns starting with both strategies cooperating:
import numpy as np
import matplotlib.pyplot as plt
v_1 = np.array([1,0])
v_2 = np.array([2 / 5, 1 / 3])
M = np.array([[v_1[0] * v_2[0], v_1[0] * (1 - v_2[0]), (1 - v_1[0]) * v_2[0], (1 - v_1[0]) * (1 - v_2[0])],
[v_1[1] * v_2[0], v_1[1] * (1 - v_2[0]), (1 - v_1[1]) * v_2[0], (1 - v_1[1]) * (1 - v_2[0])],
[v_1[0] * v_2[1], v_1[0] * (1 - v_2[1]), (1 - v_1[0]) * v_2[1], (1 - v_1[0]) * (1 - v_2[1])],
[v_1[1] * v_2[1], v_1[1] * (1 - v_2[1]), (1 - v_1[1]) * v_2[1], (1 - v_1[1]) * (1 - v_2[1])]])
pis = [np.array([1, 0, 0, 0])]
number_of_turns = 20
for _ in range(number_of_turns):
pis.append(np.dot(pis[-1], M))
labels = ["CC", "CD", "DC", "DD"]
for state, label in zip(zip(*pis), labels):
plt.plot(state, label=label)
plt.legend();
We see that over time these probabilities no longer change: this is referred to as steady state. A probability vector $\pi$ at steady state is a solution to the following matrix equation:
$$\pi M=\pi$$The steady state of a match between (non deterministic) reactive players is given by:
$$ \pi=(s_1s_2, s_1(1-s_2), (1-s_1)s_2, (1-s_1)(1-s_2)) $$where:
$$ s_1 = \frac{q_2r_1+p_2}{1-r_1r_2}\qquad s_2 = \frac{p_2r_2+q_2}{1-r_1r_2} $$for:
$$r_1=p_1-p_2\qquad r_2=q_1-q_2$$The proof follow (after some heavy algebra) by carrying out the following multiplication:
$$ \pi M = (s_1s_2, s1(1-s_2), (1-s_1)s_2, (1-s_1)(1-s_2)) \begin{pmatrix} p_1q_1 & p_1(1-q_1) & (1-p_1)q_1 & (1-p_1)(1-q_1) \\ p_2q_1 & p_2(1-q_1) & (1-p_2)q_1 & (1-p_2)(1-q_1) \\ p_1q_2 & p_1(1-q_2) & (1-p_1)q_2 & (1-p_1)(1-q_2) \\ p_2q_2 & p_2(1-q_2) & (1-p_2)q_2 & (1-p_2)(1-q_2) \\ \end{pmatrix} $$Using this we can obtain the expected utility of the first player:
$$s_1s_2\times R + s1(1-s_2) \times S + (1-s_1)s_2 \times T + (1-s_1)(1-s_2)\times P$$The second player:
$$s_1s_2\times R + s1(1-s_2) \times T + (1-s_1)s_2 \times S + (1-s_1)(1-s_2)\times P$$def theoretic_steady_state(p, q):
r_1 = p[0] - p[1]
r_2 = q[0] - q[1]
s_1 = (q[1] * r_1 + p[1]) / (1 - r_1 * r_2)
s_2 = (p[1] * r_2 + q[1]) / (1 - r_1 * r_2)
return np.array([s_1 * s_2, s_1 * (1 - s_2), (1 - s_1) * s_2, (1 - s_1) * (1 - s_2)])
def theoretic_utility(p, q, rstp=np.array([3, 0, 5, 1])):
pi = theoretic_steady_state(p, q)
return np.dot(pi, rstp)
theoretic_utility(v_1, v_2), theoretic_utility(v_2, v_1)
(1.943877551020408, 1.943877551020408)
We can confirm this using the Axelrod library:
player_1 = axl.ReactivePlayer(probabilities=v_1)
player_2 = axl.ReactivePlayer(probabilities=v_2)
match = axl.Match(players=(player_1, player_2), turns=5000)
interactions = match.play()
match.final_score_per_turn()
(1.9434, 1.9434)