War. Huh. Good God. What is it Good For?


About 250 turns, apparently.

Motivation

My friend Scott and I sat down at our weekly Trivia venue about 20 minutes before showtime. "Hershel's said he's gonna be a bit late, he got held up with something." No bother. Busy job, these things happen.

We hurriedly ordered a couple beers apiece while they were still on Happy Hour and spent the next while eyeing the front window, waiting for our third finally wander in.

For my money, the hardest question of the night is always "What the hell is our team name gonna be??" We waffle for almost literally the next 20 minutes and finally settle on Snape Kills Dumbledore on Page 596 (one of our better ones, tbh). And like clockwork, in walks Hershel right as we're turning in our registration. "Hey, sorry. Got caught up in a really exciting game of War."

And I about short-circuit at the unintentional oxymoron.

The "Game"

For those of you who've never been a child with a deck of cards before, War is a pasttime that basically looks like the following:

  • Split a deck across two players
  • Players blindly play the top card of their decks

    • Higher card takes both cards
  • If the cards match, then each discard two cards, then play a third. This is called a War.

    • This goes on until one is higher than the other, otherwise, repeat the War step
  • When a player runs out of cards to draw from, they shuffle their discards to create a new draw pile
  • Repeat, ad nauseum, until the game is over

And that's it. There's no strategy. No choice. You just go back and forth and back and forth until the game just sort of... ends.

The outcome of the game is decided as soon as you shuffle both decks and set them in front of the players. You could literally determine who wins from the outset, obviating the need to even go through the motions. It's a complete and total waste of time, especially when there's trivia to be played. Honestly, it wouldn't be that hard to build a simulator to do th-- Wait a minute.

And so I spent a good chunk of my free time over the next couple weeks pettily doing just that

The Data

I've mostly been doing ETLs and model devlopment in PySpark the past few months, so this felt like a good an excuse as any to practice some Object Oriented Design in pure Python.

And that went well enough. Until I started hitting bugs and edge cases I hadn't considered. So if you're checking out the codebase, dive into war.Game.run_turn() at your own peril. Turns out neatly abstracting state and interdependencies gets tricky fast, haha

Ultimately, the workflow I built for this project meant firing off main.py in the project link above. In this file, I specified how many games I wanted to simulate and it would go through, run them, and saving the game state for every turn, until I had a whole mess of files called data0.txt, data1.txt, ...

Per Turn

Later, I load those text files into neat tables that look like the following (abbreviated to the first 10 turns of the first game, here).

In [2]:
whole_games = load_whole_games(1)

whole_games.head(10)
Out[2]:
num_a num_b num_aces_a num_aces_b num_kings_a num_kings_b wars game
0 26 26 2 2 1 3 0 0
1 27 25 2 2 1 3 0 0
2 28 24 2 2 1 3 1 0
3 32 20 2 2 2 2 0 0
4 31 21 2 2 2 2 0 0
5 32 20 2 2 2 2 0 0
6 31 21 2 2 2 2 0 0
7 30 22 2 2 2 2 0 0
8 29 23 2 2 2 2 0 0
9 30 22 2 2 2 2 0 0

Looking across the top, you'll see that the attributes I captured per turn were:

  • The number of cards that Player A and Player B have (deck and discard combined)
  • The number of aces and kings each player has (more on this later)
  • How many times the players went to War that turn
  • An index of which game I'm looking at

And if I load ten thousand of these files, it's a pretty big table

In [3]:
whole_games = load_whole_games(10000)

whole_games.shape
Out[3]:
(3183346, 8)

Of course, I ran ten times that amount for this post. Which just means that I've got almost a gigabyte of text files just taking up space on my computer.

Per Game

Additionally, I built a parser that will go through and grab the first and last rows of each game file. It uses this to look at starting and end conditions so it can summarize each game.

In [4]:
results = get_game_summaries()

len(results)
Out[4]:
100000

Data fields include:

  • Which game I'm looking at
  • How many aces and kings Player A started with (more on this later)
  • If Player A won the game
  • If Player A won the first round (both players exhausting their first 26 card stack)
In [5]:
results.head(25)
Out[5]:
game a_starting_aces a_starting_kings a_won a_won_first_round
0 0 2 1 True False
1 1 0 2 False False
2 10 2 3 True True
3 100 2 1 True True
4 1000 0 1 False False
5 10000 1 3 False False
6 10001 1 3 True False
7 10002 4 2 True True
8 10003 3 2 True False
9 10004 1 1 False False
10 10005 1 2 False False
11 10006 2 3 False True
12 10007 3 2 True True
13 10008 3 2 True True
14 10009 3 3 True True
15 1001 2 1 False True
16 10010 4 3 True True
17 10011 3 3 False True
18 10012 2 4 False True
19 10013 2 2 False True
20 10014 2 3 True True
21 10015 1 2 False False
22 10016 4 0 True True
23 10017 3 2 True True
24 10018 3 4 True False

The Art of War

As soon as I had a simulator cooked up that would correctly run and resolve games, I started sketching out visualizations that I'd want to make. From there, I had a good idea of what data I would want to capture during the simulations and doubled back into my code and wrote a bunch of logging methods.

Wins and Losses

If we marry the every-turn dataset to the every-game dataset, we can make neat plots that layer multiple games on top of one another.

In [6]:
games_and_results = whole_games.merge(results, on='game')

Here, I plot the first 100 games that I simulated. As you can see, there's a pretty even distribution between wins and losses, and most of the games resolve within the first few hundred turns.

In [7]:
plot_wins_vs_losses(games_and_results, num_games=100, linealpha=.2);

Now, let's look the first thousand games.

One thing I want to point out is the last argument in my plotting call, linealpha=.1. This essentially means that every line that gets plotted on the figure is about 90% see-though.

So when the first ~500 turns are basically a mess of solidly-colored red and green you're seeing the result of many, many overlapping games and outcomes.

Furthermore, if you notice the x-axis difference between this and the last post, we've stumbled across games that go on for 2,000+ turns, which is just bananas.

In [8]:
plot_wins_vs_losses(games_and_results, num_games=1000, linealpha=.1);

And then ten thousand games, because why not?

In [9]:
plot_wins_vs_losses(games_and_results, 10000, .05, .1);