Red and Blue are playing a game. Each player has two moves to choose from. In one turn, Red writes down a move secretly, and blue writes down a move secretly, then the two moves are revealed.
If Red chooses move 1 and Blue chooses move 1, Blue gets 3 points.
If Red chooses move 1 and Blue chooses move 2, Blue gets 5 points.
If Red chooses move 2 and Blue chooses move 1, Blue gets 6 points.
If Red chooses move 2 and Blue chooses move 2, Blue gets 4 points.
Here is a payoff matrix showing the possible outcomes.
It's not a fair game at all, since Red always loses. But Red can still try to lose as few points as possible, while Blue tries to win as many points as possible.
What is the best strategy for each player?
The Compleat Strategyst does not mention Nash equilibria in the index, though it was published in 1954, ten years after von Neumann and Morgenstern introduced the term in Theory of Games and Economic Behavior. Blue may randomize, but move 1 and move 2 need not have the same frequency. Does Blue have a preferred distribution?
Oh, did I leave out the part about the electric shocks? The book explains that points always go to Blue to make the analysis simpler. The reader is told to imagine that Red is compelled to play. Alternately, Red can demand to be paid to participate. Assuming Red uses optimal strategy, what is the minimum number of points per turn that Red should demand before agreeing to play? And how many points can Blue afford to offer Red on the side before each turn before Blue loses the advantage?
I'm either overthinking this or we're missing some vital piece of information. If Red is compelled to play under penalty of death, then Red doesn't care what move he makes as he gets no points either way. From Blue's perspective, Red then has a 50/50 chance between moves 1 and 2. As such, Blue has to choose between a 50/50 shot at 3 points or 6 points, or a 50/50 shot at 5 points or 4 points. This is a question of how risk averse Blue is, since he averages 4.5 points either way. If Blue has to pay Red to play, and assuming Red can refuse to play (thus denying Blue any points), then Red would demand 2 points to play at the minimum, as that's the only way Blue is guaranteed to win any points (move 1, 1 becomes 0 for Blue if Red demands 3 points). However, even if Red demands 3 points, It still comes down to how risk averse Blue is, does Blue choose move 2, winning either 5 points or 4 points and paying 3 points to Red, making his outcome either 2 points or 1 point, or does Blue choose move 1 and risk getting 0 points for the chance winning 3 points. This seems to be a question of how risk averse Blue is since Red's payoff isn't determined by which move he/she makes.
We suppose that Red wants Blue to win as few points as possible. If Red can negotiate a payment before each turn from Blue, both sides can strategize to end up with as many points as possible in the long run. It could be. In previous, simpler examples, the book says each player's priority is to avoid the worst possible loss. I am not sure if this is the only way it could work; perhaps a player could also prioritize having a chance to get the best possible outcome, without regard to the worst cases. These complications will probably become more important in more complex games.Red doesn't care what move he makes as he gets no points either way
This seems to be a question of how risk averse Blue is
I choose my numbers randomly, switching back and forth so Blue can't predict my choices, but favor the number 1 at least twice as often as I favor 2 to minimize Blue's winnings. I also call the officials a bunch of doody heads for not allowing me to win through belligerent uncoopertiveness.
The officials have noted your position, and wonder why you did not elect to play as Blue. Let's evaluate the strategy for Red to play 1 twice as often as 2. We assume Blue is a profit-maximizing mofo and will play to win as many points as possible. When blue plays 1, Red expects to pay (2 x 3) + (1 x 6) = 12 over three turns, an average of 4 per turn. When blue plays 2, Red expects to pay (2 x 5) + (1 x 4) = 14 over three turns, an average of 4 2/3 per turn. No matter what Blue does, this is an improvement over Red playing 1 all the time, which has a worst case of 5 per turn.
Let's remove all the psychology from the game, and assume that Blue's and Red's strategies are independent. Let's say Blue chooses 1 with probability p, and Red chooses 1 with probability q. Then the expected value of one round is + 5 (probability Blue chose 2, Red chose 1: it is (1-p)q) + 6 * (probability Blue chose 1, Red chose 2: it is p(1-q)) + 4 * (probability Blue chose 2, Red chose 2: it is (1-p)(1-q)) Now we are getting: = 3pq + 5q - 5pq + 6p - 6pq + 4 - 4p - 4q + 4pq = = 4 - 4pq + 2p + q = (2p - 0.5)(1 - 2q) + 4.5 Now, if p = 0.25, the expected value is 4.5 (for Blue); if p is different from that, the expected value may be less, depending on q. If p > 0.25, any q > 0.5 will give resulting value less than 4.5; if p < 0.25, any q < 0.5 will give resulting value less than 4.5. It means that the optimal strategy for Blue is to choose 1 with probability 25% and 2 with probability 75%. In very much the same way, if q = 0.5, the expected value is 4.5 (for Blue); if q > 0.5 then p < 0.25 gives value bigger than 4.5 and if q < 0.5 then p > 0.25 gives value bigger than 4.5. Since Red wants to minimize the value, the optimal strategy is q = 0.5, i.e. choose 1 and 2 with the same probability. 3 * (probability both chose 1: it is pq)
value = 3pq + 5(1 - p)q + 6p(1 - q) + 4(1 - p)(1 - q) =
I did some computer experiments, just for fun: both are playing optimal strategy: p = 0.25, q = 0.5. 100000 rounds, total value 449756 Now blue deviates to p = 0.5; red is clever to choose q = 1. Same amount of rounds, 400438. Blue got less than could. Let's say red deviated to q = 0.25, blue was clever to choose p = 1. Same amount of rounds, 524991. Red lost more than it could. If blue plays with optimal p = 0.25, red cannot do anything to lose less than 4.5 per round (see above, the whole q-dependent part is multiplied by 0). Let's say p = 0.25, q = 0.75, 100000 rounds: total value 450115. In a similar way, if red plays with q = 0.5, blue cannot do anything to win more than 4.5 per round. Let's say q = 0.5, p = 0.75: result is 449799.
These are silly. In a game you can't lose, you should invest no resources into your decision making and play as many instances of this game simultaneously as possible to maximize return. The worst I can do is 3 points per game as Blue. The best I can do is 6 points per game as Blue. If I play two games simultaneously without devoting any decision to either of them, the worst I can do as Blue is 6 points, which is the former best. Because there are no resources devoted to deciding how to play, this can scale infinitely and with untrained players.
The point of the exercise is to determine the best strategy on each turn. Blue may not be able to play as often and rapidly as desired. The payoffs are all positive for Blue just to make the analysis simpler. The same strategy would apply if all the payoffs were five points lower: R1 B2: no points R2 B1: Blue gets 1 R2 B2: Red gets 1 R1 B1: Red gets 2
After working my way through Chapter 2, I think the analysis is as follows. First, we look for a "saddle point" (probably the same as the Nash equilibrium flagamuffin mentioned). Each side assumes the other side will play optimally. So if Red plays 1, Red assumes Blue will play 2 and win five instead of three. And if Red plays 2, Red assumes Blue will play 1 and win six instead of four. Red would rather pay five than six, so Red's preferred move is 1 with a value (loss) of five. Blue faces the prospect of winning only three with move 1 versus winning at least four with move 2. So Blue prefers 2 to win at least four. Red's best worst case is five, Blue's best worst case is four. These don't match, so we conclude that a "pure" strategy (always making the same move) is not optimal for either side. Next, we determine the best mixed strategy for each side. With move 1, Red is exposed to a difference of two in the possibilities for Blue. With move 2, Red is again exposed to a difference of two depending on how Blue plays. So Red doesn't prefer either move, and should play both moves with equal frequency. Now Red doesn't care how Blue plays. Whenever Blue plays 1, Red will pay 3 or 6 with equal frequency, averaging 4.5. Whenever Blue plays 2, Red will pay 5 or 4, again averaging 4.5. Determining Blue's strategy should confirm the value. With move 1, Blue is exposed to a difference of three depending on what Red does. With move 2, the difference in outcomes is one. So Blue's best strategy is to prefer move 2 at a ratio of three-to-one. Whenever Red plays 1, Blue wins (1x3 + 3x5) / 4 = 4.5. Whenever Red plays 2, Blue wins (1x6 + 3x4) / 4 = 4.5. So Red plays at 1:1 and Blue plays at 1:3 and the value of the game is 4.5.
It doesn't have a name, it's more of a recipe. I am not sure I understand why it works. The idea is to measure the difference between the two outcomes if you choose a move, and use that difference as the relative proportion with which you will choose the other move. You ignore how good or bad the outcomes are, and favor moves with a narrow spread in outcomes. It only works in these simple 2x2 games, but 2xN games can be reduced to 2x2 games. The idea is to eliminate any advantage the adversary can gain with strategy, and I see how using the derived proportion forces the other side to accept the same outcome regardless of what move they make. (Interestingly, if the other side fails to optimize, but stupidly chooses the same move every time, or uses a suboptimal distribution, the result will not change. Optimal strategy by one side forces a fair outcome for both, where "fair" means the best that either side can hope for when both sides use optimal strategy.)