# Blog

## Casino Math: Markov Chains

### Introduction

Understanding and applying Markov chains is an essential component of calculating probabilities in casino games that would otherwise become unwieldy. I have used Markov chains in calculating probabilities associated with popular slot features such as collection bonuses, sticky Wilds, and Lightning Link-style bonuses.

I typically use Markov chains in a games where there are a reasonable number of states the player can go through. The definition of reasonable depends on time constraints and computational processing power available. Because matrix multiplication is involved, the processing time grows cubically with respect to the number of states.

### The Basics

To model a game as a Markov process we must define the following items:

1. A well-defined state space $\mathcal{S}=\{s_0, s_1, ..., s_N\}$. This is simply the set of all the states the player can be in.
2. A probability distribution of the initial state space, $X_0$. i.e. What is the initial probability of being in each state? $X_0$ is typically represented by a $1\times N$ row vector.
3. The transition matrix, $T$. Each element of $T$ defines the probability of moving from one state to another. For example, the probability of transitioning from state $s_i$ to state $s_j$ would be given by the $ith$ row and $jth$ column of $T$. Note that it is essential that $T$ does not change from round to round.

From here we can easily determine the state probability distribution $X_n$ at each step in the process:

$X_n = X_0 T^n$

$X_n$ is a $1\times N$ dimensional vector that represents the probability of being in each state after the $nth$ step of the process.

### Example

Consider a game where a player is given an equal chance of starting with 1, 2 or 3 fair coins. At the beginning of each round all coins are flipped and every coin that flipped heads is removed. The game is played until all coins have been removed. As a prize for making it to each successive round the player is paid $1 at the beginning of the first round,$2 at the beginning of the second round, \$3 at the beginning of the third, etc.

To model this game as a Markov process we first define all the states that player can be in at each round. The states are 1). no coins removed, 2). 1 coin removed, 3). 2 coins removed, and 4). all coins removed (or game over).

Since the game has an equal chance of starting with 1, 2, or 3 coins already removed, we define the initial state vector like so:

$X_0 = \left[\frac{1}{3} \ \frac{1}{3} \ \frac{1}{3} \ 0 \right]$

It usually takes a little more work to determine the transition matrix. For this game it is defined as follows:

$T = \left[\begin{array}{cccc}\frac{1}{8} & \frac{3}{8} & \frac{3}{8} & \frac{1}{8} \\ 0 & \frac{1}{4} & \frac{1}{2} & \frac{1}{4} \\0 & 0 & \frac{1}{2} & \frac{1}{2} \\0 & 0 & 0 & 1 \\ \end{array} \right]$

From here we can determine the state distribution vector at each round of the game…

$X_n = X_0 T^n = \left[\frac{1}{3} \ \frac{1}{3} \ \frac{1}{3} \ 0 \right] \left[\begin{array}{cccc}\frac{1}{8} & \frac{3}{8} & \frac{3}{8} & \frac{1}{8} \\ 0 & \frac{1}{4} & \frac{1}{2} & \frac{1}{4} \\0 & 0 & \frac{1}{2} & \frac{1}{2} \\0 & 0 & 0 & 1 \\ \end{array} \right]^n = \left[ p_{1,n} \ p_{2,n} \ p_{3,n} \ p_{4,n} \right]$

where

$\begin{array}{rl} p_{1,n} & = \frac{8^{-n}}{3} \\ p_{2,n} & =\frac{4^{-n}}{3}+\frac{1}{3}(34^{-n}-38^{-n}) \\ p_{3,n} & = \frac{4^{-n}}{3}+\frac{1}{3}(-2^{1-2n}+2^{1-n}) +\frac{1}{3}(-32^{1-2n}+32^{-n}+38^{-n}) \\ p_{4,n} & = \frac{1}{3}(1-2^{-n}) +\frac{1}{3}(1-2^{1-n}+4^{-n}) + \frac{1}{3}(1-32^{-n}+34^{-n}-8^{-n}) \\ \end{array}$

Side note: In case you’re wondering on how to get a nice formula for $T^n$, you can take a look at this example. In my case, I cheated and used Mathematica. 🙂

These value $x_{i,n}$ represents the probability of being in state $s_i$ during round $n$. Note that based on the above equations it is clear that $\lim_{n\rightarrow\infty}p_{4,n} = 1$, implying that the game is guaranteed to end given enough time. A few more interesting properties of this game can be uncovered by analyzing these equations. For example, on average, how many rounds of this game can the player be expected to play?

$\text{Expected rounds}=\sum_{n=1}^\infty (p_{1,n}+p_{2,n}+p_{3,n})=\frac{101}{63}\approx 1.603$

How about the value of the game itself?

$\text{Expected game value}=\sum_{n=1}^\infty (p_{1,n}+p_{2,n}+p_{3,n})n=\frac{4580}{1323}\approx\3.46$

What other interesting properties from this game can you discover by modeling the game as a Markov chain?

## Jensen’s Inequality

The modern philosopher Nassim Nicholas Taleb says that one should never cross a river that is on average 4 feet deep. The intuition is clear: Averages are a borderline useless metric when the payoff structure (i.e. being in a certain depth of water) is non-linear. To put it another way, being in 8 feet of water is not twice as bad as being in 4 feet. Jensen’s Inequality captures this idea in a single elegant statement:

Let $f$ be a convex function and $X$ be an arbitrary random variable. Then

$f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$

In the example above we could think of $X$ to be water depth and $f$ to be our relative discomfort. When pain is convex (disproportionately uncomfortable) with respect to some input, our expected pain is much worse than the pain of the average input.

This can be flipped into a more positive outcome. Think of how much more enjoyable it is to eat a single large meal in the evening than trying to micromanage your calorie intake with 10 smaller meals evenly spread out throughout the day. I guarantee your total enjoyment from all those little snacks won’t add up to be anywhere close to enjoyable as that dinner at the end of the day. Or to paraphrase Socrates, hunger is the best spice.

Try to think of more applications of Jensen’s Inequality. When else is the average utility greater than the utility of the average?

## The Kelly Criterion

I have an open secret I’d like to share: I love gambling. It’s one of my favorite pastimes. If you don’t like gambling that’s fine. We’ll probably never be best friends, but we can get along. Now if you’re the type of person who will judge people that enjoy gambling, that’s where we’re going to start running into problems. How would you like it if I judged you for loving Judd Apatow movies? You’d probably think I’m a jerk. Well bug off, this post isn’t for you!

So there are a couple things you should know about me. I’m a mathematics enthusiast (obviously), but I am also born and raised in Vegas (lived there for the first 27 years of my life). I have also worked for the casino gaming industry as a game designer and mathematician. Let me tell you something: You CAN’T win; not in the long run at least. I’m sure most of you know this, but there is still a small minority of people that think they have it figured out; some system (martingale betting being the most common mistake among the less savvy gamblers). So if I can’t win then why do I enjoy it so much? Well, maybe I’m just an adrenaline junkie. It’s a roller-coaster ride. I don’t even know if I enjoy the money so much as the high beating the odds against a rigged system. And I’m happy to pay a (reasonable) premium for that thrill.

Now don’t get me wrong, some people can beat the system. See Ed Thorp or Jeff Ma who applied betting systems for beating blackjack. Or Haralabob Voulgaris who has made a very nice living off betting on NBA games. Then there’s the endless amount of poker pros such as Phil Ivey and Daniel Negreanu–I’m sure most of them will tell you that poker isn’t gambling, and I’m inclined to agree, but we still associate poker with the casino.

No matter what kind of gambling you’re in to you must at least be familiar with one single formula: The Kelly Criterion. And I am using gambling in the absolutely loosest sense of the word. The Kelly Criterion can actually be applied to any quantifiable stake where your return depends on an uncertain outcome. This includes poker, the stock market, or evaluating a job offer!

The formula is given by:

$f = \frac{p(b + 1) - 1}{b} = p - \frac{1-p}{b}$

Let $p$ be the probability of winning a bet with payoff odds of $b$ (i.e. for every dollar you place on a bet, you win back $b\$. Then $f$ is the percentage of your bankroll that you should place on that bet in order to maximize your long run returns. Note: A negative value implies you should place your stake on the other side of that bet.

The first thing I noticed about this equation was that it seemed way too conservative. For example, take the limit of $b\rightarrow\infty$. We get

$\lim_{b\rightarrow\infty} f = p$

If we set $p=\frac{1}{2}$ we get $f\rightarrow\frac{1}{2}$. So no matter how large your payoff is for a 50-50 outcome, you should never risk more than half of your current bankroll. This flies in the face of a cursory examination of expected value. Say we get paid a 100 to 1 on a coin flip. The expected return for any dollar we place into this wager is $\frac{1}{2} (\ 100) + \frac{1}{2}(-\1) = \49.5$, so shouldn’t we place as much money as possible into this bet to maximize our returns? Absolutely not! And the reason is risk of ruin.

You don’t want to be put in a position where you can lose your entire bank-roll, even with a near infinite payoff. After repeated trials you’re still eventually going to lose everything and you’re going to have to start again from 0.

So clearly any fraction of your bankroll less than 100% will avoid risk of ruin. Why not bet 90% of our bankroll on a positive EV bet? Well, it was derived that this ratio will maximize your long run rate of return. I have to admit it still feels overly conservative, but that’s one of the beautiful things about math: It doesn’t care how you feel. You start with your assumptions, you go through the motions, and you observe the results.

One side effect of the Kelly Criterion is it also makes a great case for diversification. Don’t put your entire bankroll on any single investment! No matter how great the return may look, if there’s even a 1% chance it could go bust you have to put some of your money somewhere else!

So let me ask a question: Can you think of any exceptions to using the Kelly Criterion? When might you want to be more or less aggressive? No right or wrong answer here. Just curious what you can come up with.

## The Bellman Equation

I’ve been taking classes in AI and Machine Learning and I’ve already bumped into this one on a few separate occasions. In my experience I usually see it in the context of Markov Decision Processes. Here it is:

$V^\pi(s) = R(s,\pi(s)) + \gamma \sum_{s'} P(s'|s,\pi(s)) V^\pi(s')$

This guy is pretty nasty looking, but he’s actually not so bad once you get to know him. First let me list out what everything in there is supposed to represent…

• $V^\pi(s)$: Think of this as value of being in a given state of the world ($s$) given that you tend to behave in a certain way ($\pi$).
• $\pi$: In machine learning we call this the policy, but that’s just another of saying an agent’s behavior. It maps the current state of the world to an action $\pi(s)=a$. Think of it this way: Let’s say my state $s$=”hungry.” This state tends to be mapped to the action $a$=”go eat a sandwich.”
• $R(s,\pi(s))$: Now we’re looking at the right-hand side of the equation. This is what’s called the reward function. It’s just a number in arbitrary units that tells us how good (or bad) it feels to be in state $s$, and performing action $\pi(s)=a$.
• Note: If we stopped here the equation above would look kind of stupid, right? The value of an action at a given state is equal to the reward from taking an action at a given state? Seems kind of redundant. Luckily it has more to say!
• $\gamma$: The discount rate where $0 \leq\gamma<1$. More on this later.
• $P(s'|s,\pi(s))$: Think of $s'$ as any state of the world after performing your action $\pi(s)=a$. Say you finish your sandwich, then your new state could be “I’m not hungry” or “I’m now dating Selena Gomez.” $P(s'|s,\pi(s))$ represents the probability of transition to that new state, given your current state and action.

With the meaning of these variables defined we can think of the 2nd term in the equation as an expected value. Specifically, the expected value of our future returns given our present behavior:

$V^\pi(s) = R(s,\pi(s)) + \gamma \mathbb{E}[V^\pi(S)|s,\pi]$

Even though you won’t typically see the Bellman Equation unless you take some very specialized coursework in machine learning, I couldn’t help but feel sense of familiarity with this one the first time I saw it. Then it hit me: In my economics classes! Every business student is familiar with discounted cash flow. In this case we’re not trying to calculate the net present value of an investment, we’re trying to calculate the net present value of an agent’s behavior.

The discount rate $\gamma$ tells us to what degree we should value the outcome of our future behavior. A $\gamma$ of close to 0 and an agent will become hedonistic, concerned only with the current state of the world. A $\gamma$ close to 1 and the agent is willing to forgo present reward for future gain, but perhaps to a fault–who cares if I have a guaranteed way of becoming a billionaire 200 years from now? The value of $\gamma$ is up to us, but it’s a matter of striking the right balance between the present and the future.