CMPS 144L Fall 2019
Lab #13 (Week of Nov. 18): Heaps and Huffman Coding

The activities in this lab mimic problems that you are likely to see in either the upcoming test or the final exam. You can answer them using pen(cil) and paper, or you could put the answers in a plain text file and submit it.

Activity #1: Operations in Heaps

                 4 
                / \
               /   \
              /     \
             /       \
            /         \
           /           \
          /             \
         6               8 
        / \             / \
       /   \           /   \
      /     \         /     \
     /       \       /       \
    9        12     15        10
   / \      /  \
  /   \    /    \
14    23  18    13
Consider the min-heap shown to the right, assumed to represent a priority queue in which priorities increase as key values decrease. (Only the keys are shown and not any associated data.)

1. Show the contents of an array that represents the heap.

2. Carry out the following operations in succession, showing what the heap looks like after each operation has been completed. Recall that inserting an element involves adding a new leaf containing that element into the tree (at the next available leaf position) and then sifting up until its parent's key is smaller than or equal to its key. Deleting the minimum involves removing the "last" leaf in the tree, after placing its element into the root, and then sifting down from the root.

(a) insert(11)
(b) insert(2)
(c) deleteMin()
(d) deleteMin()


Activity #2: Huffman Coding

SymbolFrequency
a (000)26%
b (001)6%
c (010)2%
d (011)15%
e (100)28%
f (101)8%
g (110)4%
h (111)11%
 
Suppose that we wish to compress a file and that we are interpreting its contents as being a sequence of 3-bit blocks, which is to say that we are viewing it as being a string "over the alphabet"

{000, 001, 010, 011, 100, 101, 110, 111}

consisting of all bit strings of length three. For the sake of convenience, we will use the names a through h to refer to these eight "symbols", respectively. (Also for the sake of convenience, we will assume that the file's length, in bits, is divisible by three.)

The frequencies with which the symbols occur in the file is given by the table to the left.

(a) Build a Huffman Tree based upon the given frequencies.

(b) Use the Huffman Tree to assign codewords to the "symbols" a through h.

(c) Compute the ratio between the lengths of the resulting compressed version of the file and the original file. Of course, this involves computing the average number of bits, per symbol occurrence, that would be used in encoding the file using the codewords. This is the sum, taken over all eight symbols in the alphabet, of the product of the symbol's frequency and the length of its codeword.