Overview of Image Compression



Transform Coding

One dimensional signals are usually represented in terms of a set of basis functions. The Fourier transform has cosinusoids as its basis functions. We can expand the concept of transforms to 2 dimensions. An image can be represented in terms of a set of arrays called basis images.

The purpose of image coding is to represent the image with a fewer number of bits, while maintaing the visual quality of the image. Image transforms have been widely used to implement image coders. Image transforms are useful for coding, since images generally have a compact representation in the transform domain.

An image transform takes N*M pixel values in an image and transforms them into N*M transform values. The original image has its energy spread throughout the image. This means that the large pixel values are likely to be anywhere in the image. A "good" transform has a high degree of energy compaction. The large transform values are localized. The histogram of pixel values illustrates the fact that most pixels have very low values and a few pixels have large values. A large number of bits are allocated to these few transform coefficients. Very few bits are allocated to the small transform values. Thus most of the information in the image can be represented by a smaller number of total bits.

The compressed image can be obtained by applying the inverse transform to the transform values. The optimal transform is the one that minimizes the distortion between the original and compressed images. This transform is the Karhunen-Loeve transform. However the Karhunen-Loeve transform requires knowledge of the second order statistics of the image and is very difficult to implement. Since the second order statistics of the image are difficult to obtain, approximations to the Karhunen-Loeve transform are preferred in practice. One of these is the Discrete Cosine Transform (DCT). The DCT is a fast transform with complexity of O(NlogN). It gives real values which correspond to the frequency content of the signal. It works extremely well with highly correlated data. One versionof the DCT is implemented in the popular JPEG compression scheme. The wavelet transform is another excellent transform for image coding. Its results are closer to that of the K-L transform than the Discrete Cosine Transform.

Fractal Coding

The word "fractal" was first used by Benoit Mandelbrot in 1975. Fractals are shapes that exhibit the property of self similarity. A self similar shape has components, that resemble the whole shape. If an image is truly fractal in nature, the entire image can be reconstructed using a set of affine transformations of an arbitary figure. The arbitary figure could be a box, circle, triangle etc. Natural objects like trees, snowflakes etc. can have this fractal property. Fractals have a so called scaling symmetry. The fractal looks exactly the same at any scale. Thus, if you magnify a fractal shape, it will continue to look exactly the same.

Fractal shapes


Most everyday images are not self similar. However these images have a loose version of self similarity. Everyday images can be partially constructed from affine transformations of small parts of themselves. The image of "lenna" exhibits the idea of a "loose self similarity". She is standing in front of a mirror. There is a reflection of the hat in the mirror. The reflected portion can be obtained using an affine transformation of a small portion of her hat. Parts of her shoulder, are almost identical. The entire image cannot be made up of transforms of small parts. The fractal representation of an entire image will thus result in some error. Fractal image coding is lossy.

Lenna


Practical image coders divide an image into a series of non overlapping small blocks called ranges. The image is also divided into a series of large blocks called domains. Individual range blocks are considered in turn. A domain block along with a set of transformations is then associated with this range block. The domain block is found by cycling through all possible combinations of range blocks and transformations. The combination that produces the least mean square error is then selected. The procedure of finding transformations to represent a range block, is highly computationally internsive. The decoding involves starting from a block of ones. A set of stored transformations is applied to the block of ones to obtain a particular domain block. The image is stored as a collection of affine transformations. The potential bit savings are huge.