Return to Virtual Port Townsend

Taking the photographs

Image Warping

Correcting for radial distortion

Cylindrical Projection

Image Alignment

Image Blending

Finishing Touches

Test Images

I took all the images for Virtual Port Townsend between Feb 16th and 18th, 2002 with a Canon Powershot A10 digital camera mounted on a tripod with a Kaidan head attached. The Kaidan head clicks every 20°, which allowed me to take 18 equally spaced images per 360° revolution. With the Camera oriented sideways, I had about a 50% overlap between successive images. This much overlap was necessary for automatic image alignment to work well.

For the first four panoramas I set the Camera on "stitch-assist" mode. In this mode exposure time is not automatically adjusted between successive exposures, and thus there is very little color, brightness, and contrast variation from frame to frame. The only noticeable difference was typically between the first and last frame, as natural lighting tended to shift slightly during the time it took to snap all 18 frames in a sequence.

For the last two panoramas I did not use "stitch-assist" mode. Because some images were taken facing the sun, and some away from it, as well as every direction in between, there was considerable exposure-time variation and the image colors did not match up. I had to adjust the color, brightness and contrast of each image individually before stitching them together. Even with this adjustment, you can still easily see the boundaries between frames in the resulting panoramas.

In order to make the 18 frames fit together to form a panorama, the images had to be warped. I had to warp the images for two reasons: firstly to correct the inherent radial distortion of the camera, and secondly to convert from the camera's planar projection to the cylindrical projection of the panorama.

Doing an image warp requires a warp-function that maps coordinates from the source image to their corresponding coordinates in the warped destination image. There are two common methods of applying a warp function to an image. In forward warping each pixel from the source is copied to a location on the destination computed by the warp function. Although forward warping is usually easy to do, it has the disadvantage that not every pixel in the destination will necessarily have a mapping from a pixel in the source, resulting in missing pixels, or holes, in the resulting warp. Thus I was forced to use inverse warping.

In inverse warping
each pixel on the destination image is colored by sampling the source image at
a point computed from the warp function's inverse. If the sample-point
falls between pixels, the destination pixel can be colored by interpolation. I used
bi-linear interpolation, in which the pairs of pixel-values above and below the
sample-point are first linearly interpolated based on the sample point's
`x`-position,
followed by linear interpolation of these results based on the sample point's
`y`-position.

Inverse warping can only be used when the inverse of the warp function can be found, which fortunately is the case here.

All consumer-grade cameras have a significant amount of radial distortion due to the fact that the lens is not perfectly lens-shaped. This distortion must be removed to make panoramas; otherwise the images won't match up exactly. Here is an approximation for radial distortion:

`r` = `x`^{2} + `y`^{2}

`x`_{d} = `x` ( 1 + `k`_{1}`r`^{2} + `k`_{2}`r`^{4} )

`y`_{d} = `y` ( 1 + `k`_{1}`r`^{2} + `k`_{2}`r`^{4} )

In these equations
`x` and
`y` are undistorted image coordinates and
`x`_{d} and
`y`_{d}
are distorted image coordinates.
`k`_{1} and
`k`_{2}
are the coefficients of radial distortion, which are unique to the individual camera.

Knowing the coefficients of radial distortion for my camera at the particular focal length I was using, I was able to warp the images to remove most of the distortion.

Even after correcting for radial distortion, the images wouldn't align with each other because they were planar projections of the scenes. The images needed to be projected onto a cylinder in order to fit together as a panorama.

Changing to a cylindrical projection required another image warp. Of course I needed the inverse of the cylindrical projection equation, not the equation itself, since I was using inverse warping. I computed the inverse of the equation for cylindrical projection with some simple trigonometry. It turns out to be:

`x` = `f` tan `θ`

`y` = `h` / cos `θ`

This converts cylindrical coordinates
`θ`, `h` to planar coordinates
`x`, `y`. The variable
`f` is the focal length, the radius of the cylinder.

The cylindrically projected images had almost perfect alignment. However, I wasn't about to manually align all those images! Instead I used Lucas and Kanade motion estimation techniques to find the misalignment vectors between successive frames and iteratively correct them. Lucas and Kanade motion estimation is based on the optical-flow equation, which assumes that, at a given image point, the amount of image motion in the direction of the image gradient times the magnitude of the image gradient at that point is equal to the decrease in image intensity at that point. Here is the optical flow equation:

∇`I` · [`u` `v`] = - ∂`I` / ∂`t`

In this equation
`I` is the image intensity,
`t` is time, and
`u` and
`v` are the
`x` and
`y` components of the image motion.

There are two problems with this assumption. First, it only tells us the amount of motion
in the direction of the gradient, not the amount of motion perpendicular to the gradient (we have
two unknowns,
`u` and
`v`, and only this one equation).
Secondly, it is only a linear approximation: The image gradient does not necessarily remain constant,
after all.

To deal with the first deficiency, Lucas and Kanade motion estimation applies the optical-flow equation to a window of pixels. Hopefully the window will not have the same gradient everywhere, so we will have enough information to solve the equation. Actually, we will end up with an over-constrained system of linear equations. The best we can do with this is to find the solution that gives the least sum-squared error. Since we're dealing with only two unknowns this is very easy to do with some basic linear algebra.

To deal with the fact that the optical flow equation gives us only a linear approximation we only use it in cases where the amount of motion is very small, usually less than one pixel. However, we're dealing with images that are unaligned by perhaps tens or hundreds of pixels. The solution involves super-sampling each image to create a Gaussian Pyramid. A Gaussian Pyramid is a stack of images, the base image at full-resolution, and higher images at successively lower resolutions. We begin Lucas and Kanade motion estimation at the lowest resolution, where hopefully the amount of motion will be less than a pixel's width. Every time we move to a higher resolution, we translate the image to remove any misalignment we have already computed, thus ensuring that the amount of remaining misalignment is still less than one pixel's worth.

Once I found the alignment vectors between successive images, all that was left to do was to put them all together into complete panoramas. However, the fact is that it's nearly impossible to get images to line up absolutely perfectly for a panorama. There will always be small shifts and color changes between successive images. Thus, if the images are just pasted together without any blending, the borders between frames will be very apparent to the human eye due to the sudden alignment change and color shift.

The solution to this problem is to create a blend at the borders between frames, gradually fading one image into the next. The width of the blended region must be carefully adjusted for the blend to look natural. Using too wide a blend region results in obvious "ghosting", while using too narrow a blend region makes the transition too sudden and obvious.

I used an exponential blend function, reducing the blend weight exponentially with distance from the center of each image. The advantage of this technique was that it made the blended area always appear halfway between image centers, independent of actual image overlap, and at the same time made the blend width easy to control by changing the base of exponentiation. Here is the formula for determining the intensity of a pixel in the blended image:

`I` = ( `I`_{1}`b`^{|x1|} + `I`_{2}`b`^{|x2|} ) / ( `b`^{|x1|} + `b`^{|x2|} )

The variables are
`I`, the blended image intensity,
`I`_{1} and
`I`_{2}, the image intensities of the two images to be blended,
`x`_{1} and
`x`_{2}, the
`x`-coordinates from the two images, and
`b`, the base of exponentiation, a number slightly larger than
1.
In dealing with color images, the image intensities would actually be vectors containing the three
color-components.

Generating panoramas that wrap around so that their left and right edges match took a little extra trickery. Instead of just blending 18 frames together in a row to create a panorama, I actually blended 19 images together, the last image a copy of the first. Then I cropped the blended image at the middle of the first and last frame so that it would begin and end at the same place.

Next I applied an affine transformation to the panorama. An affine transformation is basically a skew and/or a stretch plus a translation, and can be specified by a transformation matrix plus a translation vector. The affine transformation I applied was computed to skew the image so that it would meet itself when it wrapped around and shift the image so that the upper left-hand corner would be the origin.

Finally I cropped the top and bottom off the images to get rid of raggedness. The result
was beautiful 360° panoramic images. These images look really nice by themselves, although
the warping due to the cylindrical projection is quite obvious. Viewing them in
**LIVE PiCTURE**
undoes the cylindrical projection for a more natural look, but I don't think they look
as cool that way.

Some of the images contained moving objects such as people or vehicles. In cases where such objects crossed frame boundaries, I had to do some additional manual editing in a photo editing application to touch up the images. Mostly I cut objects from the individual warped images and pasted them into the panorama. Sometimes I had to modify the frames themselves in cases where large moving objects "confused" the image alignment algorithm.

The software to do the image processing described above is written in the C++ programming language and compiles in Microsoft Visual C++ (hopefully it should also work with a "real" compiler like the Gnu C compiler). The program has a simple command-line interface that makes it quick and easy to go from raw images to completed panoramas.

The first image below is a test panorama I made to test the functionality of my completed panorama-generation software. The software had a difficult time aligning some of the images due to the extreme exposure variations between frames. The second image is the result of an experiment to determine the feasibility of generating panoramas from images taken with a handheld camera instead of using a tri-pod and Kaidan head. As you can see, the handheld sequence has some alignment problems due to the fact it's hard to manually rotate a camera about it's vertical axis to take a series of photographs without moving the camera in other dimensions in the process.

Test Sequence: University of Washington's HUB Lawn | Full Image

Handheld Camera: University of Washington's Sieg Hall, 2nd Floor | Full Image

Return to Virtual Port Townsend

All images and text copyright © 2002-2004 David Dewey. You may use these images on your own site if they are clearly credited to me and include a link to my website.

This page last modified March 30, 2004.