Compression (sneak preview of what you'll need for project 1) Two major forms of compression: lossless and lossy. Lossless: what goes in is what comes out. Lossy: trade off "quality" for compression (audio, video, images). Fundamental idea behind any compression: find a more compact representation of the same information. Huffman codes: General purpose method of efficiently representing tokens based on their frequency. More frequent tokens need fewer bits to represent them. - Must be unambiguous - Good if it's "optimal" (more on that later in the semester) - First, do frequency counts of all tokens - Algorithm - add all tokens + frequency counts into a set - loop until set has only one element - remove two elements with *minimum* frequency count - create a tree node, with these as children - frequency count of tree node = sum of children - add back into the set - Final tree is labelled with 0/1 for left/right transitions. - To encode: - Look up token in tree, output bits from the root down to the token - To decode: - Walk down the tree, following the bits until you hit a terminal. - Output token, go back to the top and repeat. Important observation: Huffman codes require a minimum of one bit per token. If you want better compression, you need to know more about the structure of your data. FAX machines: At 100 lines per inch, there are 850x1100 pixels (~1M) on a printed page. FAX communication: 9600 bits per second. Uncompressed, it would take ~100 seconds to transmit a page. (More because they want error checking.) Run length encoding: small integers saying how long each run of black or white pixels is. Huffman codes for these integers (based on how frequently different integers occur, presumably on some large corpus of test documents). For common case (mostly white page with a little black), the compression can be 10:1. Question: what's the worst case? Text compression: Rather than letter-by-letter compression, we want to take advantage of larger structures (e.g., words & phrases). Straw-man version: everybody has a copy of the dictionary. Every word has a number (its position in the dictionary). Compression is writing out a list of integers. Problem 1: everybody needs the same (large) dictionary. Problem 2: what if you want to support a foreign language? Goal: adaptively build dictionary based on the input. Classic papers: Lempel-Ziv (1977) or Lempel-Ziv-Welch (1984). Gzip uses something related to LZ77, but not exactly the same. Let's talk about how Gzip works, starting from the decompression side. - At any given time, the decompressor already knows the text it's output. (Or, a window of that text.) - Decompression instructions either say: - output these N bytes - output N bytes we've seen before (relative to the current position) - Standard Huffman table (not even terribly optimized in Gzip) for how to efficiently represent these instructions. - Results can vary widely. - 2x for many text documents. - Much more if it gets lucky. - Typically breaks even, at worst, even for random bits. Okay, how about the compression side? - Simple compressor keeps a hash table of character strings seen in the past and their offsets. Use hash table to look up current characters (e.g., 3 at a time) in the history. If not found, just output a literal character and move on. If found, any entry in the hash table could potentially be used with a "distance/length" token. Closer to the front = shorter Huffman codes Further from the front = possibly more characters repeated More CPU spent searching --> improved compression. Lossy compression: Fundamental idea: the human visual / auditory system doesn't see every pixel or hear every minute flutter of audio. You can make changes that are "perceptually" indistinguishable. - Images: eyes less sensitive to high frequencies. Also, more sensitive to brightness variation than to color variation. - Audio: some tones "mask" other tones. Your brain will reconstruct some missing harmonics. Etc. - Video: like still images, but want to take advantage of similarities from one frame to the next frame. Basics of JPEG compression: - RGB -> YUV - Discrete Cosine Transform (a relative of the Fourier Transform), works on 8x8 pixel blocks and is invertible. - Quantize higher frequencies more than lower frequencies - Quantize U/V more than Y (in MPEG video, U/V are also downsampled) - Variable quantization == JPEG "quality" knob - Zig-zag, RLE, Huffman - Lossless transformation. Assumes that quantization will leave many zeros behind in the JPEG blocks. - Starting with 8 bits per pixel per channel, 24:1 compression still looks pretty good. Assume a 6 megapixel camera (2K x 3K pixels). Uncompressed picture: 6M pixels * 3 bytes = 18MB Photoshop "maximum" quality JPEG: 3MB (6:1 compression) Camera "fine" quality JPEG: 1.5MB (11:1 compression) - you'd be hard pressed to see the difference Even at 24:1 might see some "ringing" or "blocking" artifacts, but only if you zoomed in.