Overview of Image Compression
Transform Coding
One dimensional signals are usually represented in terms of a set of basis
functions. The Fourier transform has cosinusoids as its basis functions.
We can expand the concept of transforms to 2 dimensions. An image can be
represented in terms of a set of arrays called basis images.
The purpose of image coding is to represent the image with a fewer number
of bits, while maintaing the visual quality of the image. Image transforms
have been widely used to implement image coders. Image transforms are
useful for coding, since images generally have a compact representation in
the transform domain.
An image transform takes N*M pixel values in an image and transforms them
into N*M transform values. The original image has its energy spread
throughout the image. This means that the large pixel values are likely to
be anywhere in the image. A "good" transform has a high degree of energy
compaction. The large transform values are localized. The histogram of
pixel values illustrates the fact that most pixels have very low values and
a few pixels have large values. A large number of
bits are allocated to these few transform coefficients. Very few bits are
allocated to the small transform values. Thus most of the information in
the image can be represented by a smaller number of total bits.
The compressed image can be obtained by applying the inverse transform to
the transform values. The optimal transform is the one that
minimizes the distortion between the original and compressed images. This
transform is the Karhunen-Loeve transform. However the Karhunen-Loeve
transform requires knowledge of the second order statistics of the image
and is very difficult to implement. Since the second order statistics of
the image are difficult to obtain, approximations to the Karhunen-Loeve
transform are preferred in practice. One of these is the Discrete Cosine
Transform (DCT). The DCT is a fast transform with complexity of
O(NlogN). It gives real values which correspond to the frequency content of
the signal. It works extremely well with highly correlated data. One versionof the DCT is implemented in the popular JPEG
compression scheme. The wavelet transform is another excellent transform
for image coding. Its results are closer to that of the K-L transform than
the Discrete Cosine Transform.
Fractal Coding
The word "fractal" was first used by Benoit Mandelbrot in 1975. Fractals
are shapes that exhibit the property of self similarity. A self similar
shape has components, that resemble the whole shape. If an image is truly
fractal in nature, the entire image can be reconstructed using a set of affine
transformations of an arbitary figure. The arbitary figure could be a box,
circle, triangle etc. Natural
objects like trees, snowflakes etc. can have this fractal
property. Fractals have a so called scaling symmetry.
The fractal looks exactly the same at any scale. Thus, if you magnify a
fractal shape, it will continue to look exactly the same.
Fractal shapes
Most everyday images are not self similar. However these images have a loose
version of self similarity. Everyday images can be
partially constructed from affine transformations of small parts of
themselves. The image of "lenna" exhibits the idea of a "loose self
similarity". She is standing in front of a mirror. There is a reflection
of the hat in the mirror. The reflected portion can be obtained using an
affine transformation of a small portion of her hat. Parts of her
shoulder, are almost identical. The entire image
cannot be made up of transforms of small parts. The fractal representation of an entire image will thus result
in some error. Fractal image coding is lossy.
Lenna
Practical image coders divide an image into a series of non overlapping small blocks
called ranges. The image is also divided into a series of large blocks
called domains. Individual range
blocks are considered in turn. A domain block along with a set of
transformations is then associated with this range block. The domain block
is found by cycling through all possible combinations of range blocks and
transformations. The combination that produces the least mean square error
is then selected. The procedure of finding transformations to represent a
range block, is highly computationally internsive. The decoding involves
starting from a block of ones. A set of stored transformations is applied
to the block of ones to obtain a particular domain block. The image is
stored as a collection of affine transformations. The potential bit savings
are huge.