* / \ / \ / \ / \ / \ / \ * * / \ / \ / \ / \ * * * * / \ \ E / / \ \ / / \ \ / * * * * A / \ F / \ / \ / \ * * * * G C D B |
A : 000 B : 1101 C : 0011 D : 1100 E : 10 F : 011 G : 0010 |
A binary codeword tree | The induced mapping |
---|
To the right is an example of such a tree. We imagine that the edges leading into left children are labeled by 0 and those leading into right children are labeled by 1. The sequence of labels on the edges along the path from the root node to a given node is referred to as that node's path label. For each leaf node, the path label of that node is taken to be the binary codeword of the symbol shown underneath that node.
In this way, a binary codeword tree defines/induces a mapping from a set of characters (namely those that are associated with the leaf nodes of the tree) to a set of binary codewords (namely those that are the path labels of the leaves). The tree to the right induces the mapping shown next to it.
Note that, because only path labels of leaf nodes serve as codewords, it is not possible for a binary codeword tree to define a mapping in which one codeword is a proper prefix of another. (Reasoning: For two nodes to be such that one's path label is a proper prefix of another's, the former node must be a proper ancestor of the latter. But leaf nodes are never proper ancestors.)
It follows that the set of codewords produced by any binary codeword tree is uniquely decipherable, which is of vital importance.
One restriction that we could have placed upon binary codeword trees (but did not) is that all interior nodes must have two children. Indeed, all Huffman trees satisfy this property. Allowing a node to have only one child gives rise to unnecessarily long codewords. In our example, the codewords for characters F, D, and B could be shortened by one bit by getting rid of the two one-child interior nodes in the tree.
With respect to a character-to-codeword mapping, the problem of encoding is that of translating a sequence of characters into the concatenation of their codewords. The problem of decoding is the inverse, in which a concatenation of codewords is translated back into the corresponding sequence of characters.
The mapping defined by our example tree would encode the string DEAD as 1100100001100 (the concatenation of 1100 for D, 10 for E, 000 for A, and 1100 for D, respectively). Decoding translates in the opposite direction, which here is to say that 1100100001100 would decode to DEAD.
A binary codeword tree would be a very valuable tool in performing decoding, but it would be quite a bad tool to use when encoding. A much better tool to use for encoding is an array of type String[]. Such an array codeWords[] could define a character-to-codeword mapping like this:
Suppose character x (of type char) has bit string y (of type String) as its codeword. Then codeWords[x] should have y as its value.
In case it strikes you as odd that Java allows an array index to be of type char, note that values of type char are represented by integers, and the characters that typically occur in "plain text" files are represented by the integers in the range [32..127). Thus, for example, the character 'k' is represented by the integer 107, and so codeWords['k'] means the same thing as codeWords[107].
The problems addressed in this programming assignment are
Pertaining to the problem of building a binary codeword tree, provided are the following files:
The tree is built in an incremental fashion using repeated calls to the augmentTree() method, each call to which adds whatever nodes are necessary to incorporate into the tree information regarding one (character, codeword) pair.
It is the augmentTree() method that is left for you to complete. The method contains hints (in the form of comments) to lead you in the right direction.
The mappings described by the array and the tree are displayed, and the user is responsible to checking to see whether they are the same.
The figure below shows the output that should be produced by the program when fed as input the contents of the map0.txt file. Notice that the first time the mapping is displayed (based upon the contents of the String[] array), it is in increasing order by the characters in the mapping's domain. But when the mapping induced by the tree is displayed, it is in increasing lexicographic order by the codewords. But the two mappings are the same, which is the vital point.
Mapping described in input file: A : 000 B : 1101 C : 0011 D : 1100 E : 10 F : 011 G : 0010 Mapping described by the constructed tree: A: 000 G: 0010 C: 0011 F: 011 E: 10 D: 1100 B: 1101 |
To illustrate how the augmentTree() method works, suppose that on successive calls it was used to build the tree representing the mapping described in map0.txt (as shown above). The figures below show what the tree would like initially and after each call to augmentTree().
* |
* / / * / / * / / * A |
* / \ / \ * * / \ / \ * * / / / / * * A \ \ * B |
* / \ / \ * * / \ / \ * * / \ / / \ / * * * A \ \ \ \ * * C B |
* / \ / \ / \ / \ * * / \ / \ * * / \ / / \ / * * * A \ / \ \ / \ * * * C D B |
Initially | After adding A:000 |
After adding B:1101 |
After adding C:0011 |
After adding D:1100 |
---|
* / \ / \ / \ / \ / \ / \ * * / / \ / / \ * * * / \ E / / \ / / \ / * * * A \ / \ \ / \ * * * C D B |
* / \ / \ / \ / \ / \ / \ * * / \ / \ / \ / \ * * * * / \ \ E / / \ \ / / \ \ / * * * * A \ F / \ \ / \ * * * C D B |
* / \ / \ / \ / \ / \ / \ * * / \ / \ / \ / \ * * * * / \ \ E / / \ \ / / \ \ / * * * * A / \ F / \ / \ / \ * * * * G C D B |
After adding E:10 |
After adding F:011 |
After adding G:0010 |
---|
Pertaining to the problem of decoding an encoded message by using a binary codeword tree, provided are the following files: