What Is Stereo Matching?

Robert Washbourne - 4 years ago - programming

Stereo matching, also known as Disparity mapping, is an important subclass of computer vision. Curiosity, the mars rover, uses Stereo matching. So do Google’s new self driving cars, as well as quadcopters, helicopters, and other flying vehicles. Stereo matching is robust and fast because it only uses cameras. It is easy to set up yourself, and not too hard to get running.

Stereo matching is a subclass of a wide range of algorithms to find depth. It's the simplest way to find depth with two rectified images.

How does it work?

Stereo matching works by finding corresponding points in rectified images. Rectified images means moved so all points line up horizontally in the image, like this:

After the images are rectified, we run the main loop.

  1. Images are split into 1 pixel high strips from the main image.

  2. Then, a window is slid along to find the best match in the images.

  3. The resulting distance from one image to the next is compiled into an image for every pixel in the images.

The lighter parts of the disparity map are closer, the darker are farther away. These maps are great because you can see details you may not have noticed before. On the right of the image, the painting is at a tilt, which I could not see that with just the plain image.

So how do we code this?

Disparity matching programs are difficult to write and very finicky. So that’s why I made a simpler program that still has good results. Keep in mind this simpler one may have a bit of noise, because in areas where the image is very similar, like the surface of a white wall, the code will make some mistakes. Now to get image data, we use scipy.misc.imread.

We remove the means from the images to give the program more variety to find the right window correctly. We resize the images to make the program faster. These are the images I am using:

This is the core of the program. We move along a window to check each values similarity to another window being moved along. The best matches are appended and used for distance. Now we need to run this function for every 1 pixel high part in the main images.

What the sys.stdout.write does is update a block of text with progress every iteration. Now we have to show the images. We use pyplot with subplots to show the images.

If you did this all right, then when you run the code, you get a lovely depth map. If you fidget with the values for when the function is called, better results can be achieved. Default values are (31,30) and the image resize is 0.4 of size.

More info

This simple example is called template matching. To see a more detailed explanation, see my article here.

The code used (Python) is available here.

More depth finding algorithms


Click on "stereo vision" to read papers and try demos. For example:

Bilaterally Weighted Patches

Cost Aggregation with Guided Filter