Summary
The adaptive Huffman algorithm as described by Vitter had three constraints that were compelling to relax.
The first is the size of the table. The size of the table was constrained based on initial knowledge of the set being compressed—typically constrained to 256 for a single byte alphabet size.
The NotYetTransmitted sequence never adjusted its weight based on its own frequency so clusters of new characters would not benefit from reduced sizes of this sequence.
Once new characters become rare, it makes sense to reduce the NotYetTransmitted weight and therefore demote it farther down the Huffman tree.
Source
The source for these changes can be found on GitHub. The implementation is C#. C/C++ implementations will be added in the future.
Relaxing the Size Constraint
The size constraint of the table was straightforward to relax. The ordering from root to leaves from the highest order to lowest was rearranged so the root was order zero and child nodes were assigned subsequent orders, from right to left (essentially the distance from the root order from the original implementation). Order corresponds to the index in the list tracking the table. If necessary, new entries could be added to the table without affecting the order of previous entries. The element of lowest weight (and highest order) was found at the end of the list.
Adjustable NotYetTransmitted Weight
To allow the NotYetTransmitted weight to be adjustable and participate in migrating through the tree, it is treated as any other leaf where its weight can increase when the sequence is encountered (or it needed to be emitted). The original implementation called for the NotYetTransmitted leaf to be where new nodes are inserted. The new algorithm only uses the leaf of lowest weight (found as the last element in the tree). The last element becomes an internal node and two new entries are added to the list. The first is the original entry, the second is the new entry with weight zero. Then the weight of the entry is incremented following the standard rules.
Reducing Weights
Eventually the NotYetTransmitted sequence will become rare to emit so retaining the weight and preventing other symbols from having shorter sequences is less desirable. The ability to decrement the weight of a node is necessary.
In the new implementation, increasing the weight requires ensuring the current node is of the lowest order in the same weight class (other than its parents) before incrementing the weight (and then repeating the process for each parent). Similarly, decreasing the weight requires ensuring the current node is of the highest order of the same weight class (other than its children) before decrementing the weight (and then repeating the process for each parent). While skipping parents does not require additional data structures more information is required for skipping children. In the latter case a queue is used to track the orders of the children. When a child is encountered it can be removed from the queue and its children can be added to the queue. Children will be added breadth-first, in-order.
Example
The following example uses the sequence “astrachan_” and increases the NotYetTransmitted weight whenever it needs to be emitted and decreases it when emitting a preexisting symbol. *NYT* represents the NotYetTransmitted symbol.
The format of each symbol node is
Node Index | Symbol | Weight |
---|---|---|
Bit Sequence |
The format of each internal node is
Node Index | Weight |
---|
Adding “a”
Adding “s”
Adding “t”
Adding “r”
Updating “a”
Adding “c”
Adding “h”
Updating “a”
Adding “n”
Adding “_”
Sample Generation
The samples above were generated by using the above library and code to generate DOT files. Then dot.exe was used to generate the corresponding png files.
The code to generate the DOT files based on the library is:
using System.IO; using OneOddSock.Compression.Huffman; namespace DotBuilder { internal class Program { private static void Main(string[] args) { if (!Directory.Exists(args[0])) { Directory.CreateDirectory(args[0]); } var compressor = new DynamicHuffman<char>(); int step = 0, stepPart = 0; char symbol = ':'; compressor.DotWriter = (string dotGraph) => File.WriteAllText( Path.Combine(args[0], string.Format("Step_{0}_{1}_{2}.gv", step, symbol, stepPart++)), dotGraph); foreach (char ch in args[1]) { ++step; symbol = ch; stepPart = 1; compressor.WriteCode(ch, (bool value) => { }, (char value) => { }); } } } }