###
We notice that an object given a certain distance away from the camera
exhibits a disparity in location in the left and right
images. Specifically, there is a certain distance for which an object will
appear to be at the exact same location in both left and right images. Any
object closer will exhibit a positive disparty, for which the right image
will show the object to the left of the same image in the left image. Any
object further will exhibit a negative disparty, for which the right image
will show the object to the right of the same image in the left
image. There is a direct correlation between the disparty and the actual
distance of the object from the camera.

We find this disparity by observing the normalized cross correlation
between the left and right images. Different similarity measures have been
used in the literature, but it has been shown that the zero mean normalized
cross correlation and the zero mean sum of squared differences tend to get
better results. This estimate is independent of the differences un
brightness and contrast due to the normalization.

The normalized cross correlation of two windows can be written as follows
(with f and g being the intensity values of M x N images at a given
position). The variable d refers to a disparity between the two images. In
our case, the correlation is performed along the epipolar line. If for any
point in the left image, the search window is assumed to be within d = [-w, w]
in the right image.

where

The output of the cross-correlation is a 3D matrix. This diagram shows
it's configuration:

This next diagram is a detail of the cross-section in the matrix. Notice
that the left image has a depth axis that lines up with the Z axis and the
right image has a depth axis that is along the diagonals. This is because
the cross-correlation kept the left image stationary, while the right image
moved. It is important to keep in mind that neither the left or right
correlation is more important than the other.

The second stage is to extract the correct depth from the correlation
matrix. We will have a depth corresponding to every pixel in the image so
for each (i, j) point, we get a depth column. This vector contains
correlation coefficients and we want the depth of the best match. With a
simple case, we simply take the location of the maximum value in the
vector. Here is a more complicated/reliable method:

Find each relative peak has a value greater than or equal to its 4
connected neighbors in the uncompressed correlation cube.
If we apply multiscaling to the image, we can greatly improve our
ability to find the actual disparity, considering a number of low pass
images and redefining a peak as having a value greater than or equal to one
half the value of the strongest peak along each viewing direction. This
basically limits the search space by removing any peaks that are relatively
weak. We have arbitrarily chosen to define one half the value of the
strongest peak.

Last modified: Tue May 11 00:09:54 CDT 1999