The Kelly Criterion

I have an open secret I’d like to share: I love gambling. It’s one of my favorite pastimes. If you don’t like gambling that’s fine. We’ll probably never be best friends, but we can get along. Now if you’re the type of person who will judge people that enjoy gambling, that’s where we’re going to start running into problems. How would you like it if I judged you for loving Judd Apatow movies? You’d probably think I’m a jerk. Well bug off, this post isn’t for you!

So there are a couple things you should know about me. I’m a mathematics enthusiast (obviously), but I am also born and raised in Vegas (lived there for the first 27 years of my life). I have also worked for the casino gaming industry as a game designer and mathematician. Let me tell you something: You CAN’T win; not in the long run at least. I’m sure most of you know this, but there is still a small minority of people that think they have it figured out; some system (martingale betting being the most common mistake among the less savvy gamblers). So if I can’t win then why do I enjoy it so much? Well, maybe I’m just an adrenaline junkie. It’s a roller-coaster ride. I don’t even know if I enjoy the money so much as the high beating the odds against a rigged system. And I’m happy to pay a (reasonable) premium for that thrill.

Now don’t get me wrong, some people can beat the system. See Ed Thorp or Jeff Ma who applied betting systems for beating blackjack. Or Haralabob Voulgaris who has made a very nice living off betting on NBA games. Then there’s the endless amount of poker pros such as Phil Ivey and Daniel Negreanu–I’m sure most of them will tell you that poker isn’t gambling, and I’m inclined to agree, but we still associate poker with the casino.

No matter what kind of gambling you’re in to you must at least be familiar with one single formula: The Kelly Criterion. And I am using gambling in the absolutely loosest sense of the word. The Kelly Criterion can actually be applied to any quantifiable stake where your return depends on an uncertain outcome. This includes poker, the stock market, or evaluating a job offer!

The formula is given by:

 f =  \frac{p(b + 1) - 1}{b}  = p - \frac{1-p}{b} 

Let p be the probability of winning a bet with payoff odds of b (i.e. for every dollar you place on a bet, you win back b\$. Then f is the percentage of your bankroll that you should place on that bet in order to maximize your long run returns. Note: A negative value implies you should place your stake on the other side of that bet.

The first thing I noticed about this equation was that it seemed way too conservative. For example, take the limit of b\rightarrow\infty. We get

\lim_{b\rightarrow\infty} f = p

If we set p=\frac{1}{2} we get f\rightarrow\frac{1}{2}. So no matter how large your payoff is for a 50-50 outcome, you should never risk more than half of your current bankroll. This flies in the face of a cursory examination of expected value. Say we get paid a 100 to 1 on a coin flip. The expected return for any dollar we place into this wager is \frac{1}{2} (\$ 100) + \frac{1}{2}(-\$1) = \$49.5 , so shouldn’t we place as much money as possible into this bet to maximize our returns? Absolutely not! And the reason is risk of ruin.

You don’t want to be put in a position where you can lose your entire bank-roll, even with a near infinite payoff. After repeated trials you’re still eventually going to lose everything and you’re going to have to start again from 0.

So clearly any fraction of your bankroll less than 100% will avoid risk of ruin. Why not bet 90% of our bankroll on a positive EV bet? Well, it was derived that this ratio will maximize your long run rate of return. I have to admit it still feels overly conservative, but that’s one of the beautiful things about math: It doesn’t care how you feel. You start with your assumptions, you go through the motions, and you observe the results.

One side effect of the Kelly Criterion is it also makes a great case for diversification. Don’t put your entire bankroll on any single investment! No matter how great the return may look, if there’s even a 1% chance it could go bust you have to put some of your money somewhere else!

So let me ask a question: Can you think of any exceptions to using the Kelly Criterion? When might you want to be more or less aggressive? No right or wrong answer here. Just curious what you can come up with.

The Bellman Equation

I’ve been taking classes in AI and Machine Learning and I’ve already bumped into this one on a few separate occasions. In my experience I usually see it in the context of Markov Decision Processes. Here it is:

 V^\pi(s) = R(s,\pi(s)) + \gamma \sum_{s'} P(s'|s,\pi(s)) V^\pi(s')

This guy is pretty nasty looking, but he’s actually not so bad once you get to know him. First let me list out what everything in there is supposed to represent…

  •  V^\pi(s) : Think of this as value of being in a given state of the world (s) given that you tend to behave in a certain way (\pi).
  • \pi: In machine learning we call this the policy, but that’s just another of saying an agent’s behavior. It maps the current state of the world to an action \pi(s)=a. Think of it this way: Let’s say my state s=”hungry.” This state tends to be mapped to the action a=”go eat a sandwich.”
  • R(s,\pi(s)): Now we’re looking at the right-hand side of the equation. This is what’s called the reward function. It’s just a number in arbitrary units that tells us how good (or bad) it feels to be in state s, and performing action \pi(s)=a.
    • Note: If we stopped here the equation above would look kind of stupid, right? The value of an action at a given state is equal to the reward from taking an action at a given state? Seems kind of redundant. Luckily it has more to say!
  • \gamma: The discount rate where 0 \leq\gamma<1. More on this later.
  • P(s'|s,\pi(s)): Think of s' as any state of the world after performing your action \pi(s)=a. Say you finish your sandwich, then your new state could be “I’m not hungry” or “I’m now dating Selena Gomez.” P(s'|s,\pi(s)) represents the probability of transition to that new state, given your current state and action.

With the meaning of these variables defined we can think of the 2nd term in the equation as an expected value. Specifically, the expected value of our future returns given our present behavior:

 V^\pi(s) = R(s,\pi(s)) + \gamma \mathbb{E}[V^\pi(S)|s,\pi]

Even though you won’t typically see the Bellman Equation unless you take some very specialized coursework in machine learning, I couldn’t help but feel sense of familiarity with this one the first time I saw it. Then it hit me: In my economics classes! Every business student is familiar with discounted cash flow. In this case we’re not trying to calculate the net present value of an investment, we’re trying to calculate the net present value of an agent’s behavior.

The discount rate \gamma tells us to what degree we should value the outcome of our future behavior. A \gamma of close to 0 and an agent will become hedonistic, concerned only with the current state of the world. A \gamma close to 1 and the agent is willing to forgo present reward for future gain, but perhaps to a fault–who cares if I have a guaranteed way of becoming a billionaire 200 years from now? The value of \gamma is up to us, but it’s a matter of striking the right balance between the present and the future.

Net Present Value

Honestly, this equation doesn’t tickle me like others do. It’s practical and it’s good to know, but it’s a little too cold-hearted for my tastes. There’s just no love in it. 😦 However I still really want to talk about as a lead up to my next post which will be about the Bellman Equation.

Before we dive deeper into the Bellman Equation let’s look at how you would evaluate some investment s, where the best return on alternative investment is given by 1+i:

 NPV(s) = \sum_{t=0}^{\infty} \frac{R_t}{(1+i) ^{t}}

NPV(s) is the present value of some investment s.

What this is trying to tell us is that we shouldn’t just look at the nominal returns of some asset, but we should discount any future returns by how far out in the future we receive it. Think of it this way: A dollar today is more valuable than a dollar tomorrow. Why? We could invest that dollar right now and receive some (small) return. Or the dollar might not be as valuable tomorrow because of inflation. Or we could just straight up get hit by a comet by the time we get to enjoy the fruits of our newfound fortune! That’s what the discount rate \frac{1}{1+i} is trying to factor in: Our value of the present over the value of the future.

Now I’m going to ask you a question to help that discount rate sink in: How much would you pay for a million bucks in your bank account right now? That sounds kind of stupid, right? You’d probably pay any value up to a million bucks. How about a year down the road? Or 50 years down the road? Try to derive a value  \frac{1}{1+i} based on how much you would pay to receive that money in the future.

Bayesian Robustness

As an addendum to my previous post I would like to show how sensitive that type of analysis is to our prior assumptions. Why would we do this? Well, let’s say we’re not really sure what a good estimate is for our prior assumptions. We can instead choose a range of values and see what happens to our posterior analysis.

I used a beta distribution to to generate random values for P(X|\theta), P(X), and P(\theta) (with means centered around 0.99, 0.2, and 0.0001, respectively). After generating those random values I obtained the following density for P(\theta|X):


This shows a heavily weighted right tail. For this particular example the skewness comes from our assumption about P(X|\theta); extremely small values can cause it to blow up. Nevertheless, the average state of our beliefs (E[P(X|\theta)]\approx 0.004) says that we can be comfortable about our assumptions.

We could stop here, but I would like to dive into a little risk analysis. Wouldn’t you say that even a 1% risk is too much of a risk to allow Jeff Bezos to walk freely in American society, knowing he has a large potential for harm? Let’s say that if Bezos is an upstanding citizen, he provides some arbitrary unit of benefit to society (say +1 utility). However, if he is more concerned with bringing about the New World Order than selling Amazon Prime memberships–and he certainly has the economic means to bring about mass destruction–then we can say his negative contribution is a hundred times worse than his positive contribution (say -100 utility).

So what is my belief about Bezos’ expected contribution to society after observing the evidence against him? Using the above distribution he still, on average, provides 0.96 utility to society.

Even though his downside is disproportionately negative (based on my assumptions) compared to his upside, my beliefs suggest that it’s still not really worth entertaining.

Bayes’ Theorem

I wanted to start off my blog with something simple to dip my feet in the water, and so I’m going to go with Bayes’ Theorem. Here it is:

P(\theta|X) = \frac{P(X|\theta) P(\theta)}{P(X)}

This guy is single-handedly responsible for creating an entire branch of statistics, and it so simple that its derivation is typically done in an introductory class on the topic (usually in the first couple weeks when you’re going over the basics of probability theory). It’s not until you go into more advanced classes that one realize it has a lot more to say than how big the ratio is of the cross section of circles on a Venn diagram. When I was taught Bayes’ Theorem I just thought of it as a nifty little trick for converting P(A|B) statements to something that uses P(B|A). And as a student I said to myself “Cool, good enough to do my homework and pass a test I’ll never see this again.”

Fast forward 2 years later and I actually DO see it bullshit again. And surprise, it’s used for EVERYTHING. In fact, there is an entire field of mathematics dedicated to understanding its properties. It provides a new way of not just looking at statistics, or of probability, but of human knowledge! Bayes’ Theorem tells us to stop looking as our knowledge as some fixed property. The world may contain true and false statements about itself, but our knowledge about it is constantly fluctuating with new evidence, and we need to update our ideas about the world accordingly based on that evidence.

So let’s look back at the original theorem. I don’t like the way it’s usually written. Instead I’d like to make a small adjustment.

P(\theta|X) = \frac{P(X|\theta) P(\theta)}{P(X)}  = \frac{P(X|\theta) }{P(X)} P(\theta) = \text{BayesianAdjustment}(X,\theta) P(\theta)

My little improvement on Bayes’ Theorem consists of just highlighting the little adjustment factor given by  \frac{P(X|\theta) }{P(X)}. I believe this little ratio hasn’t been given the right amount of credit in the current literature. If we think of P(\theta) as our confidence in the belief of some statement, and P(\theta|X) to be our updated level of confidence of some statement (where X is some evidence for or against that statement), then the adjustment factor should tell us exactly how much our beliefs should change.

So let’s think of \theta as some statement about the world. This could be anything, like say… “Jeff Bezos is an Illuminati shill.” Now personally I don’t believe this is true, but I like to think of myself as an open minded individual so I won’t completely rule it out. I will assign the accuracy of this statement some small probability. Let’s say P(\theta)=0.0001. So there is one chance in 10,000 that Jeff Bezos is definitely an Illuminati shill.

So how can you tell if someone is actually working for the Illuminati? Well, every once in a while they’ll throw out a hand signal (sort of like a low key gang sign). See exhibit A:


So one day Jeff Bezos is giving a keynote address and he decides to sit down. Lo and behold the camera gives him a quick glance and this creep is throwing out this ungodly hand sign, signaling his complicity in a hostile world-takeover by our Satanic overlords. Or he could’ve just randomly rested his hands there for no particular reason (like I said, I’m open minded). Let’s assign a value to these two possible explanations….

Let’s call the act of giving the hand signal our evidence X. Then in general the probability of Bezos giving out the hand signal if he is an Illuminati member is P(X|\theta)=1.0, and the probability of him putting his hands there (for no particular reason) is P(X) = 0.2. Looking at these numbers by themselves the evidence seems pretty damning, but we still haven’t considered our prior assumptions about Jeff Bezos. Initially we pegged his odds of being a devil-worshipper at 1 in 10,000. Let’s plug all these into Bayes’ Theorem and see what should be our updated confidence in Jeff Bezos’ Illuminati complicitness should be, given this new piece of evidence…

P(\theta|X) = \frac{P(X|\theta)}{P(X)} P(\theta)= \frac{1.0}{0.2} \times 0.0001= 0.0005

Well, I definitely think he’s more likely to bring about the New World Order than I did before, but not by a significant enough margin to spout apocalyptic nonsense via HAM radio…

I could go on and on about this wonderful little equation. I can talk endlessly about how it is the most powerful epistemological statement in modern philosophy. But the fact is: you already use it in your every day life. Perhaps not as precisely as you should, but you are using it loosely every time your beliefs change. Every time you are presented some piece of evidence about the world and what is happening with it.  Every time you’re not sure if Bitcoin is a good value after the last dip, or when you’re absolutely confident that all pickles taste like ass after your hundredth try. Next time you’re reading a news article think about how your beliefs are being updated, and to what degree and why.

Now I’d like to ask the reader: When was the last time you changed your mind about something? Can you assign numbers to your beliefs? If you would like to go beyond what I discussed, try asking yourself how robust your inferences are. How much do your posterior beliefs change based on your prior assumptions? Change up your values and come up with a basic “sensitivity” analysis.

Let me see what you come up with!