Basic AR (Part 1)


The goal of this project is to find a plane in a video and project an image to fixed surface on that plane. Some of the techniques used in order to perform this task in real time include Hough Line detection to detect surface planes, SURF detection to detect interest points, basic KNN clustering to compare interest points between frames, RANSAC to determine the homography between frames, along with several optimization techniques in order to get the code to perform at the required 30 fps on HD video (1280×720 resolution).

The use of monocular AR combined with markerless tracking has been an actively researched topic since the mid-90s. Applications include consumer entertainment (especially on mobile and wearable devices), medical imaging, and architectural prototyping. A lot of modern AR research since the 2010s has been focused on implementing SLAM methodology.

All code and processing has been performed and tested on an Intel i5 2.3 GHz processor. All code has been written using Python 2.7 and OpenCV libraries.

Algorithm Outline

Below is a list of the steps performed. A lot of the implementation ideas have been borrowed from the paper Markerless Tracking using Planar Structures in the Scene (by Simon, Fitzgibbon, and Zisserman).

  • Initialization
    1. Set image I_1 \leftarrow (where the first frame of a video).
    2. Determine a surface plane in I_1.
    3. Compute a homography H_p that can be used to project onto the above surface plane.
    4. Project object A onto surface using H_p \times A
  • Loop
    1. Set I_0\leftarrow I_1 and I_1 \leftarrow
    2. Compute homography H_1 that projects I_0 to I_1.
    3. Update the image projection homography H_p \leftarrow H_w\times H_p.
    4. Project object A onto surface using H_p\times A

For the remained of this report,  H_w will be used to denote the homography between frames, and H_p will be used to denote the homography to project the ad onto a surface.

Plane Detection Using Hough Lines

To detect a plane upon initialization, a Hough Line detection method is used. First the edges are detected using a Canny Edge filter (i.e. a thresholded gradient of each image at each pixel). From there, pixels along each edge are converted from a point in Cartesian space to a curve in the Hough space (r,\theta). Intersections (or near intersections) of multiple curves in Hough space correspond to lines in the original Cartesian space.

I assume there is a grid-like structure on the plane to which I am projecting my image. To take advantage of this assumption I do the following: After enough lines are detected they are each clustered into their own groups based on their slope. From there I assume the two largest clusters determine perpendicular lines on a plane. I use lines from the top two clusters to find 4 points onto which I will project my image. Note that this method will fail if there are no dominant lines in the image, or if the two most dominant line directions are not perpendicular.


In the above images the left image would result in good surface detection. The right image would result in bad surface detection since the dominant lines do not determine a grid on a plane.

To be continued…

Counter-Intuitive Sampling Result

I’ve been reading ET Jaynes’ book on probability theory and encountered an interesting little result.

Consider the classic urn model where we are presented with a bucket containing 4 balls (2 red and 2 white) and we draw from this urn without replacement. Denote R_i to be the event that a red ball is picked from the urn on the ith draw. So P(R_1) would be interpreted as the probability of  picking a red ball on the first draw.

Now consider the following probabilities P(R_1|R_2) and P(R_1| R_2 \text{ or } R_3). For the second probability, we consider the case where the red ball is picked on the 2nd or 3rd draw (possibly both). Which probability do you suppose is greater? Think about this question for a second before continuing reading.

Let’s imagine that the sampling experiment has already taken place and you selected all the balls from the urn blindly. I tell you that the 2nd ball you selected was red. What do you suppose are the chances the first ball you drew was also red? This is easy to calculate…

P(R_1|R_2) = \frac{P(R_1, R_2)}{P(R_2)} = \frac{\frac{2}{4}\times\frac{2-1}{4-1}}{\frac{1}{2}} = \frac{1}{3}

This result easily matches our intuition. If 1 of the 3 remaining balls that is unaccounted for is red, we should expect to have a 1 in 3 chance of having selected it first.

Now consider the case where I tell you that either the 2nd or 3rd ball (possibly both) are red. Let’s calculate the chance that the first ball selected is also red…

P(R_1|R_2 \text{ or} R_3) = \frac{P(R_1 \text{ and } (R_2 \text{ or } R_3)) }{P((R_2 \text{ or } R_3) } = \frac{P(R_1 \text{ and } (R_2 \text{ or } R_3)) }{P(R_2) + P(R_3) - P(R2, R3)}  = \frac{ \frac{1}{2} \times \frac{2}{3} } {\frac{1}{2} + \frac{1}{2} - \frac{2}{4}\times\frac{2-1}{4-1} } = \frac{2}{5} 

Therefore P(R_1|R_2) < P(R_1| R_2 \text{ or } R_3). This result clashes with my own intuition. I would expect that knowledge of a red ball picked on the 2nd or 3rd draw would reduce my chances of having picked a red ball on the 1st draw, especially since R_2 \text{ or } R_3 includes the possibility of both red balls being selected on the 2nd and 3rd draws. How is it that the knowledge of a red ball possibly being drawn in an additional spot actually  increases our odds of selecting it on the first draw?

Here is how Jaynes attempts to intuit the phenomenon: The information R_2 reduces the number of red balls available for the first draw by one, and it reduces the number of balls in the urn available for the first draw by one, giving P(R_1|R_2) = (M-1)/(N-1) = 1/3. The information [ R_2 \text{ or } R_3 ] reduces the ‘[expected] number of red balls’ available for the first draw, but it reduces the number of balls in the urn available for the first draw by two.

So similarly to how we calculate P(R_1|R_2) =(2-1) / (4-1) =1/3, we should think of P(R_1|R_2 \text{ or} R_3)=(2-\langle R\rangle )/(4-2), where \langle R\rangle is the expected number of red balls removed when we know that the 2nd, 3rd, or both picks possibly withdrew a red ball. Let’s do that:

\langle R\rangle = \frac{1\times P(R_2,\bar{R_3}) + 1\times P(\bar{R_2},R_3) +2\times P(R_2,R_3)}{P(R_2,\bar{R_3}) + P(\bar{R_2},R_3) + P(R_2,R_3)} =  \frac{1\times \frac{1}{2}\frac{2}{3} + 1\times \frac{1}{2}\frac{2}{3} +2\times \frac{1}{2}\frac{1}{3}}{\frac{1}{2}\frac{2}{3} + \frac{1}{2}\frac{2}{3} +\frac{1}{2}\frac{1}{3}} = \frac{6}{5}

Thus we attain our desired result.

P(R_1|R_2 \text{ or } R_3)=\frac{2-\langle R\rangle}{4-2} =\frac{4/5}{2} = \frac{2}{5}

After encountering this result I couldn’t help but think about the famous Monty Hall problem. Both lead to seemingly contradictory results where probabilities change in seemingly counter-intuitive ways based on our knowledge. Is there really a relationship between the Monty Hall problem and the above result? Or is the connection tenuous at best?

Casino Math: Markov Chains


Understanding and applying Markov chains is an essential component of calculating probabilities in casino games that would otherwise become unwieldy. I have used Markov chains in calculating probabilities associated with popular slot features such as collection bonuses, sticky Wilds, and Lightning Link-style bonuses.

I typically use Markov chains in a games where there are a reasonable number of states the player can go through. The definition of reasonable depends on time constraints and computational processing power available. Because matrix multiplication is involved, the processing time grows cubically with respect to the number of states.

The Basics

To model a game as a Markov process we must define the following items:

  1. A well-defined state space \mathcal{S}=\{s_0, s_1, ..., s_N\}. This is simply the set of all the states the player can be in.
  2. A probability distribution of the initial state space, X_0. i.e. What is the initial probability of being in each state? X_0 is typically represented by a 1\times N row vector.
  3. The transition matrix, T. Each element of T defines the probability of moving from one state to another. For example, the probability of transitioning from state s_i to state s_j would be given by the ith row and jth column of T. Note that it is essential that T does not change from round to round.

From here we can easily determine the state probability distribution X_n at each step in the process:

 X_n = X_0 T^n

X_n is a 1\times N dimensional vector that represents the probability of being in each state after the nth step of the process.


Consider a game where a player is given an equal chance of starting with 1, 2 or 3 fair coins. At the beginning of each round all coins are flipped and every coin that flipped heads is removed. The game is played until all coins have been removed. As a prize for making it to each successive round the player is paid $1 at the beginning of the first round, $2 at the beginning of the second round, $3 at the beginning of the third, etc.

To model this game as a Markov process we first define all the states that player can be in at each round. The states are 1). no coins removed, 2). 1 coin removed, 3). 2 coins removed, and 4). all coins removed (or game over).

Since the game has an equal chance of starting with 1, 2, or 3 coins already removed, we define the initial state vector like so:

X_0 = \left[\frac{1}{3} \  \frac{1}{3} \ \frac{1}{3} \ 0 \right]

It usually takes a little more work to determine the transition matrix. For this game it is defined as follows:

T = \left[\begin{array}{cccc}\frac{1}{8} & \frac{3}{8} & \frac{3}{8} & \frac{1}{8} \\ 0 & \frac{1}{4} & \frac{1}{2} & \frac{1}{4} \\0 & 0 & \frac{1}{2} & \frac{1}{2} \\0 & 0 & 0 & 1 \\ \end{array} \right]

From here we can determine the state distribution vector at each round of the game…

X_n = X_0 T^n = \left[\frac{1}{3} \  \frac{1}{3} \ \frac{1}{3} \ 0 \right]  \left[\begin{array}{cccc}\frac{1}{8} & \frac{3}{8} & \frac{3}{8} & \frac{1}{8} \\ 0 & \frac{1}{4} & \frac{1}{2} & \frac{1}{4} \\0 & 0 & \frac{1}{2} & \frac{1}{2} \\0 & 0 & 0 & 1 \\ \end{array} \right]^n = \left[ p_{1,n} \ p_{2,n} \ p_{3,n} \ p_{4,n} \right]


\begin{array}{rl} p_{1,n} & = \frac{8^{-n}}{3} \\ p_{2,n} & =\frac{4^{-n}}{3}+\frac{1}{3}(34^{-n}-38^{-n}) \\ p_{3,n} & = \frac{4^{-n}}{3}+\frac{1}{3}(-2^{1-2n}+2^{1-n}) +\frac{1}{3}(-32^{1-2n}+32^{-n}+38^{-n}) \\ p_{4,n} & = \frac{1}{3}(1-2^{-n}) +\frac{1}{3}(1-2^{1-n}+4^{-n}) + \frac{1}{3}(1-32^{-n}+34^{-n}-8^{-n}) \\ \end{array}

Side note: In case you’re wondering on how to get a nice formula for T^n, you can take a look at this example. In my case, I cheated and used Mathematica. 🙂

These value x_{i,n} represents the probability of being in state s_i during round n. Note that based on the above equations it is clear that \lim_{n\rightarrow\infty}p_{4,n} = 1, implying that the game is guaranteed to end given enough time. A few more interesting properties of this game can be uncovered by analyzing these equations. For example, on average, how many rounds of this game can the player be expected to play?

\text{Expected rounds}=\sum_{n=1}^\infty (p_{1,n}+p_{2,n}+p_{3,n})=\frac{101}{63}\approx 1.603

How about the value of the game itself?

\text{Expected game value}=\sum_{n=1}^\infty (p_{1,n}+p_{2,n}+p_{3,n})n=\frac{4580}{1323}\approx\$3.46

What other interesting properties from this game can you discover by modeling the game as a Markov chain?

Jensen’s Inequality

The modern philosopher Nassim Nicholas Taleb says that one should never cross a river that is on average 4 feet deep. The intuition is clear: Averages are a borderline useless metric when the payoff structure (i.e. being in a certain depth of water) is non-linear. To put it another way, being in 8 feet of water is not twice as bad as being in 4 feet. Jensen’s Inequality captures this idea in a single elegant statement:

Let f be a convex function and X be an arbitrary random variable. Then

f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]

In the example above we could think of X to be water depth and f to be our relative discomfort. When pain is convex (disproportionately uncomfortable) with respect to some input, our expected pain is much worse than the pain of the average input.

This can be flipped into a more positive outcome. Think of how much more enjoyable it is to eat a single large meal in the evening than trying to micromanage your calorie intake with 10 smaller meals evenly spread out throughout the day. I guarantee your total enjoyment from all those little snacks won’t add up to be anywhere close to enjoyable as that dinner at the end of the day. Or to paraphrase Socrates, hunger is the best spice.

Try to think of more applications of Jensen’s Inequality. When else is the average utility greater than the utility of the average?