Compression

(sneak preview of what you'll need for project 1)


Two major forms of compression: lossless and lossy.
Lossless: what goes in is what comes out.
Lossy: trade off "quality" for compression (audio, video, images).

Fundamental idea behind any compression: find a more compact representation
of the same information.

Huffman codes:

    General purpose method of efficiently representing tokens based on their
    frequency.  More frequent tokens need fewer bits to represent them.

	- Must be unambiguous
	- Good if it's "optimal" (more on that later in the semester)

    - First, do frequency counts of all tokens

    - Algorithm
      - add all tokens + frequency counts into a set
      - loop until set has only one element
	- remove two elements with *minimum* frequency count
	- create a tree node, with these as children
	  - frequency count of tree node = sum of children
	  - add back into the set

    - Final tree is labelled with 0/1 for left/right transitions.

    - To encode:
      - Look up token in tree, output bits from the root down to the token

    - To decode:
      - Walk down the tree, following the bits until you hit a terminal.
      - Output token, go back to the top and repeat.

    Important observation: Huffman codes require a minimum of one
    bit per token.  If you want better compression, you need to
    know more about the structure of your data.

FAX machines:

    At 100 lines per inch, there are 850x1100 pixels (~1M) on a printed page.
    FAX communication: 9600 bits per second.  Uncompressed, it would take
    ~100 seconds to transmit a page.  (More because they want error checking.)

    Run length encoding: small integers saying how long each run of
    black or white pixels is.  Huffman codes for these integers (based
    on how frequently different integers occur, presumably on some
    large corpus of test documents).  For common case (mostly white
    page with a little black), the compression can be 10:1.

    Question: what's the worst case?

Text compression:

    Rather than letter-by-letter compression, we want to take advantage
    of larger structures (e.g., words & phrases).

    Straw-man version: everybody has a copy of the dictionary.  Every
    word has a number (its position in the dictionary).  Compression
    is writing out a list of integers.

    Problem 1: everybody needs the same (large) dictionary.
    Problem 2: what if you want to support a foreign language?

    Goal: adaptively build dictionary based on the input.

    Classic papers: Lempel-Ziv (1977) or Lempel-Ziv-Welch (1984).
    Gzip uses something related to LZ77, but not exactly the same.

    Let's talk about how Gzip works, starting from the decompression side.

    - At any given time, the decompressor already knows the text it's output.
      (Or, a window of that text.)

    - Decompression instructions either say:
      - output these N bytes
      - output N bytes we've seen before (relative to the current position)

    - Standard Huffman table (not even terribly optimized in Gzip) for how to
      efficiently represent these instructions.

    - Results can vary widely.
      - 2x for many text documents.
      - Much more if it gets lucky.
      - Typically breaks even, at worst, even for random bits.

    Okay, how about the compression side?

    - Simple compressor keeps a hash table of character strings seen
      in the past and their offsets.  Use hash table to look up current
      characters (e.g., 3 at a time) in the history.  If not found, just
      output a literal character and move on.  If found, any entry
      in the hash table could potentially be used with a "distance/length"
      token.

	  Closer to the front = shorter Huffman codes
	  Further from the front = possibly more characters repeated

      More CPU spent searching --> improved compression.

Lossy compression:

    Fundamental idea: the human visual / auditory system doesn't see
    every pixel or hear every minute flutter of audio.  You can make
    changes that are "perceptually" indistinguishable.

    - Images: eyes less sensitive to high frequencies.  Also, more sensitive
      to brightness variation than to color variation.

    - Audio: some tones "mask" other tones.  Your brain will reconstruct
      some missing harmonics.  Etc.

    - Video: like still images, but want to take advantage of similarities from
      one frame to the next frame.

    Basics of JPEG compression:

    - RGB -> YUV

    - Discrete Cosine Transform (a relative of the Fourier Transform),
      works on 8x8 pixel blocks and is invertible.

    - Quantize higher frequencies more than lower frequencies
    - Quantize U/V more than Y
      (in MPEG video, U/V are also downsampled)

    - Variable quantization == JPEG "quality" knob

    - Zig-zag, RLE, Huffman
      - Lossless transformation.  Assumes that quantization will leave
        many zeros behind in the JPEG blocks.

    - Starting with 8 bits per pixel per channel, 24:1 compression
      still looks pretty good.  Assume a 6 megapixel camera (2K x 3K pixels).
      Uncompressed picture: 6M pixels * 3 bytes = 18MB
      Photoshop "maximum" quality JPEG: 3MB (6:1 compression)
      Camera "fine" quality JPEG: 1.5MB (11:1 compression)
      - you'd be hard pressed to see the difference

      Even at 24:1 might see some "ringing" or "blocking" artifacts,
      but only if you zoomed in.