Basic AR (Part 1)


The goal of this project is to find a plane in a video and project an image to fixed surface on that plane. Some of the techniques used in order to perform this task in real time include Hough Line detection to detect surface planes, SURF detection to detect interest points, basic KNN clustering to compare interest points between frames, RANSAC to determine the homography between frames, along with several optimization techniques in order to get the code to perform at the required 30 fps on HD video (1280×720 resolution).

The use of monocular AR combined with markerless tracking has been an actively researched topic since the mid-90s. Applications include consumer entertainment (especially on mobile and wearable devices), medical imaging, and architectural prototyping. A lot of modern AR research since the 2010s has been focused on implementing SLAM methodology.

All code and processing has been performed and tested on an Intel i5 2.3 GHz processor. All code has been written using Python 2.7 and OpenCV libraries.

Algorithm Outline

Below is a list of the steps performed. A lot of the implementation ideas have been borrowed from the paper Markerless Tracking using Planar Structures in the Scene (by Simon, Fitzgibbon, and Zisserman).

  • Initialization
    1. Set image I_1 \leftarrow (where the first frame of a video).
    2. Determine a surface plane in I_1.
    3. Compute a homography H_p that can be used to project onto the above surface plane.
    4. Project object A onto surface using H_p \times A
  • Loop
    1. Set I_0\leftarrow I_1 and I_1 \leftarrow
    2. Compute homography H_1 that projects I_0 to I_1.
    3. Update the image projection homography H_p \leftarrow H_w\times H_p.
    4. Project object A onto surface using H_p\times A

For the remained of this report,  H_w will be used to denote the homography between frames, and H_p will be used to denote the homography to project the ad onto a surface.

Plane Detection Using Hough Lines

To detect a plane upon initialization, a Hough Line detection method is used. First the edges are detected using a Canny Edge filter (i.e. a thresholded gradient of each image at each pixel). From there, pixels along each edge are converted from a point in Cartesian space to a curve in the Hough space (r,\theta). Intersections (or near intersections) of multiple curves in Hough space correspond to lines in the original Cartesian space.

I assume there is a grid-like structure on the plane to which I am projecting my image. To take advantage of this assumption I do the following: After enough lines are detected they are each clustered into their own groups based on their slope. From there I assume the two largest clusters determine perpendicular lines on a plane. I use lines from the top two clusters to find 4 points onto which I will project my image. Note that this method will fail if there are no dominant lines in the image, or if the two most dominant line directions are not perpendicular.


In the above images the left image would result in good surface detection. The right image would result in bad surface detection since the dominant lines do not determine a grid on a plane.

To be continued…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s