Huffman Tree

I had never heard of the Huffman Tree prior to seeing this problem in the Daily Coding Problem challenge. 


This seems like a fairly easy algorithm, however in my opinion it is only an easy algorithm if you have prior knowledge of this or have memorized it.

Given a dictionary of character frequencies, build a Huffman tree, and use it to determine a mapping between characters and their encoded binary strings.

Here is the full description of the problem

According to problem description, this is classified as easy, however I am not sure if this classification is based on having prior knowledge of the problem, or it is based on attempting this problem from scratch. I did attempt to solve this problem, but gave up half way through and had to read up on what a Huffman Tree was (wikipedia)

Understanding the problem

The problem does not give an example of character frequencies. The presentation of the example tree followed by what "cats" would be encoded as does provide some indication of the expected result, but providing a table with the frequencies of c, a, t and s would have been more clearer.

A table of "cats" would look as follows

and the solution of the problem is two fold -

  • Construct a tree based on the table above
  • Use the tree and determine the binary encoding for each character

The main complexity (in my opinion) is constructing the Huffman Tree.

The second part of the problem is straight forward and it involves tree traversal to find every leaf node. During the course of the traversal, we can construct the encoding depending on the rules specified in the problem

When traversing the path, descending to a left child corresponds to a 0 in the prefix, while descending right corresponds to 1.

First Thoughts

  • I would need a custom data structure to represent each character in the tree - the node, and this also had to contain the frequency value.
  • I needed a PriorityQueue (Java) or min heap to keep the characters sorted and that would allow me to construct the tree.

Pitfalls


  • I missed the part about this being a BINARY TREE !
  • My instinct was to sort the table in descending order because this would allow me to build the tree having the highest occurrent character at the left most side of the tree. Whilst this is correct, it meant my algorithm would have to be constantly re-sorting each subtree to ensure that the highest occurring character wasn't buried deep in the tree as it traversed the list of characters
  • I think at point I got stuck not really knowing how to get past having to re-sort the tree as I went down the table, so I paused and did some more research on Huffman Trees. This article helped me understand the algorithm behind constructing the tree.

Constructing the Huffman Tree

Armed with some more information regarding arranging the values in the binary tree, here are the steps

  • All characters in the min heap are Tree nodes
  1. Pop the 2 least occurring nodes
  2. These two nodes are now a pair- lowest on the left and highest on the right
  3. The pair has a new parent, a parent whose value is the sum of both the pairs
  4. Push the new parent back into the heap
  5. Repeat these steps until there is only one node left in the heap
  • The remaining node is the root to our Huffman Tree