CMPS 144 Intersession 2023
Binary Codeword Tree Builder

Background

* / \ / \ / \ / \ / \ / \ * * / \ / \ / \ / \ * * * * / \ \ E / / \ \ / / \ \ / * * * * A / \ F / \ / \ / \ * * * * G C D B
A : 000
B : 1101
C : 0011
D : 1100
E : 10
F : 011
G : 0010

A binary codeword tree The induced
mapping

Huffman trees (trees that are constructed during the execution of David Huffman's algorithm by which a mapping from symbols to binary codewords is computed) are examples of a wider class of rooted binary trees that here we will refer to as binary codeword trees

* / \ / \ / \ / \ / \ / \ * * / \ / \ / \ / \ * * * * / \ \ E / / \ \ / / \ \ / * * * * A / \ F / \ / \ / \ * * * * G C D B	A : 000 B : 1101 C : 0011 D : 1100 E : 10 F : 011 G : 0010
A binary codeword tree	The induced mapping

To the right is an example of such a tree. We imagine that the edges leading into left children are labeled by 0 and those leading into right children are labeled by 1. The sequence of labels on the edges along the path from the root node to a given node is referred to as that node's path label. For each leaf node, the path label of that node is taken to be the binary codeword of the symbol shown underneath that node.

In this way, a binary codeword tree defines/induces a mapping from a set of characters (namely those that are associated with the leaf nodes of the tree) to a set of binary codewords (namely those that are the path labels of the leaves). The tree to the right induces the mapping shown next to it.

Note that, because only path labels of leaf nodes serve as codewords, it is not possible for a binary codeword tree to define a mapping in which one codeword is a proper prefix of another. (Reasoning: For two nodes to be such that one's path label is a proper prefix of another's, the former node must be a proper ancestor of the latter. But leaf nodes are never proper ancestors.)

It follows that the set of codewords produced by any binary codeword tree is uniquely decipherable, which is of vital importance.

One restriction that we could have placed upon binary codeword trees (but did not) is that all interior nodes must have two children. Indeed, all Huffman trees satisfy this property. Allowing a node to have only one child gives rise to unnecessarily long codewords. In our example, the codewords for characters F, D, and B could be shortened by one bit by getting rid of the two one-child interior nodes in the tree.

With respect to a character-to-codeword mapping, the problem of encoding is that of translating a sequence of characters into the concatenation of their codewords. The problem of decoding is the inverse, in which a concatenation of codewords is translated back into the corresponding sequence of characters.

The mapping defined by our example tree would encode the string DEAD as 1100100001100 (the concatenation of 1100 for D, 10 for E, 000 for A, and 1100 for D, respectively). Decoding translates in the opposite direction, which here is to say that 1100100001100 would decode to DEAD.

A binary codeword tree would be a very valuable tool in performing decoding, but it would be quite a bad tool to use when encoding. A much better tool to use for encoding is an array of type String[]. Such an array codeWords[] could define a character-to-codeword mapping like this:

Suppose character x (of type char) has bit string y (of type String) as its codeword. Then codeWords[x] should have y as its value.

In case it strikes you as odd that Java allows an array index to be of type char, note that values of type char are represented by integers, and the characters that typically occur in "plain text" files are represented by the integers in the range [32..127). Thus, for example, the character 'k' is represented by the integer 107, and so codeWords['k'] means the same thing as codeWords[107].

The problems addressed in this programming assignment are

building a binary codeword tree that defines the same character-to-codeword mapping as does a given array map[] (of type String[]), where, for example, map['w'] = "011001" means that the character w has 011001 as its codeword.
making use of a binary codeword tree to decode a message that had been encoded using the mapping defined by the tree.

The Student's Task

Pertaining to the problem of building a binary codeword tree, provided are the following files:

BinaryCodeNode: An instance of this class represents a node within a binary codeword tree. Alternatively, you can think of such an object as representing the subtree of which that node is the root.

BinCodeTreeBuilder: An instance of this class builds a binary codeword tree (composed of instances of the BinaryCodeNode class) from a given array of type String[]. Of course, the goal is that the constructed tree describes the same character-to-codeword mapping as does the given array.
The tree is built in an incremental fashion using repeated calls to the augmentTree() method, each call to which adds whatever nodes are necessary to incorporate into the tree information regarding one (character, codeword) pair.
It is the augmentTree() method that is left for you to complete. The method contains hints (in the form of comments) to lead you in the right direction.

map0.txt, map1.txt, map2.txt, map3.txt: Sample data files that can be processed by the program described below for the purpose of testing your work.

MapReader: Java class that has a method for reading the description of a character-to-codeword mapping (from a file with a specified name) and returning a representation of that mapping in the form of a String[] array. Used by the "tester" program described below.

TreeBuilderTester: Java program intended to be used for testing the BinCodeTreeBuilder class. It reads —from a data file whose name is specified by a run argument (as jGrasp calls it)— a description of a character-to-codeword mapping. It stores that mapping using an array of type String[] and passes the array to the constructor of the BinCodeTreeBuilder class, which, as mentioned above, makes repeated use of its augmentTree() method to construct a binary codeword tree representing the same mapping.
The mappings described by the array and the tree are displayed, and the user is responsible to checking to see whether they are the same.
The figure below shows the output that should be produced by the program when fed as input the contents of the map0.txt file. Notice that the first time the mapping is displayed (based upon the contents of the String[] array), it is in increasing order by the characters in the mapping's domain. But when the mapping induced by the tree is displayed, it is in increasing lexicographic order by the codewords. But the two mappings are the same, which is the vital point.

Mapping described in input file: A : 000 B : 1101 C : 0011 D : 1100 E : 10 F : 011 G : 0010 Mapping described by the constructed tree: A: 000 G: 0010 C: 0011 F: 011 E: 10 D: 1100 B: 1101

To illustrate how the augmentTree() method works, suppose that on successive calls it was used to build the tree representing the mapping described in map0.txt (as shown above). The figures below show what the tree would like initially and after each call to augmentTree().

*

* / / * / / * / / * A

* / \ / \ * * / \ / \ * * / / / / * * A \ \ * B

* / \ / \ * * / \ / \ * * / \ / / \ / * * * A \ \ \ \ * * C B

* / \ / \ / \ / \ * * / \ / \ * * / \ / / \ / * * * A \ / \ \ / \ * * * C D B

Initially After adding
A:000 After adding
B:1101 After adding
C:0011 After adding
D:1100

*	* / / * / / * / / * A	* / \ / \ * * / \ / \ * * / / / / * * A \ \ * B	* / \ / \ * * / \ / \ * * / \ / / \ / * * * A \ \ \ \ * * C B	* / \ / \ / \ / \ * * / \ / \ * * / \ / / \ / * * * A \ / \ \ / \ * * * C D B
Initially	After adding A:000	After adding B:1101	After adding C:0011	After adding D:1100

* / \ / \ / \ / \ / \ / \ * * / / \ / / \ * * * / \ E / / \ / / \ / * * * A \ / \ \ / \ * * * C D B

* / \ / \ / \ / \ / \ / \ * * / \ / \ / \ / \ * * * * / \ \ E / / \ \ / / \ \ / * * * * A \ F / \ \ / \ * * * C D B

* / \ / \ / \ / \ / \ / \ * * / \ / \ / \ / \ * * * * / \ \ E / / \ \ / / \ \ / * * * * A / \ F / \ / \ / \ * * * * G C D B

After adding
E:10 After adding
F:011 After adding
G:0010

* / \ / \ / \ / \ / \ / \ * * / / \ / / \ * * * / \ E / / \ / / \ / * * * A \ / \ \ / \ * * * C D B	* / \ / \ / \ / \ / \ / \ * * / \ / \ / \ / \ * * * * / \ \ E / / \ \ / / \ \ / * * * * A \ F / \ \ / \ * * * C D B	* / \ / \ / \ / \ / \ / \ * * / \ / \ / \ / \ * * * * / \ \ E / / \ \ / / \ \ / * * * * A / \ F / \ / \ / \ * * * * G C D B
After adding E:10	After adding F:011	After adding G:0010

Pertaining to the problem of decoding an encoded message by using a binary codeword tree, provided are the following files:

Encoder.java: An instance of this class applies a character-to-codeword mapping to encode messages.
Decoder.java: An instance of this class uses a binary codeword tree (that represents a character-to-binary-codeword mapping) to decode an encoded message. The decodingOf() method is left for the student to finish.
EncoderDecoderTester.java: Java program intended to be used for testing the Decoder class.

Submitting Your Work

Submit the files BinCodeTreeBuilder.java and Decoder.java to the approprate Brightspace dropbox.

CMPS 144 Intersession 2023 Binary Codeword Tree Builder

Background

The Student's Task

Submitting Your Work

CMPS 144 Intersession 2023
Binary Codeword Tree Builder