### DATA COMPRESSION STUDY GUIDE

**Terminology and Basics**

- Why compress? To save space
*and* time.
- How does compression work? Compression takes advantage of structure within data.
- What sort of data does not compress well? Random data.
- Lossy compression can further reduce file sizes by throwing away unimportant information.
- What is the compression ratio?
- Why can no compression algorithm possibly compress all bit streams?
- What fraction of bitstreams can be compressed in half by a general-purpose algorithm?

**Run-length coding**
- Takes advantage of repeated binary digits.
- How do you handle runs of length more than 2
^{M}?

**Huffman coding**
- Basic idea: Variable length codewords to represent fixed length characters.
More common characters are represented by shorter codewords.
- What is a prefix-free code?
Why is it important that Huffman coding use a prefix-free code?
Would encoding work with a non prefix-free code? Would decoding work?
- Why is it necessary to transmit the coding trie?
Why don't we have to do something similar with run length encoding or LZW?
- Why do we typically use an array for encoding and a trie for decoding?
- You do not need to know the specifics of the binary representaiton of the Huffman trie.
However, you should conceptually understand the idea of transmitting/reading
the trie using an preorder traversal.

**LZW**
- Basic idea: Fixed length codewords to represent variable length strings.
More common characters are represented by shorter codewords.
- Why do we typically use a trie for encoding and an array for decoding?
- How do you handle the 'strange' case
(where a codeword is seemingly not in the table during decoding)?

### Recommended Problems

#### C level

- Fall 2011 Final, #10b (LZW)
- Spring 2015 Final (LZW)
- Fall 2012 Final, #12 (Huffman)
- Spring 2008 Final (Huffman)
- Spring 2012 Final, #10 (BW)
- Textbook 5.5.3

#### B level

- Fall 2011 Final, #10a (Huffman)
- Textbook 5.5.13
- Textbook 5.5.17

#### A level

- Fall 2012 Final, #13