Arithmetic Coding

I have updated my github project with support for arithmetic coding.  It uses the algorithm provided by Malte Clasen and Eric Bodden.  It is an integer based encoder (32 bit unsigned).

I have made some changes to the original implementation to separate the statistical models more fully from the coder.  This allows substituting models on a per symbol basis.

An example of this behavior is provided in the ArithmeticStream class (paralleling the compression classes in System.IO.Compression).  This class uses two models: a zero order model and a new symbol model.  The former is only initialized with two symbols (stream terminator and new character).  The latter is initialized with all characters.

When a new symbol is encountered to compress the following steps occur:

  • The NewCharacter symbol from the zero order model is emitted.
  • The stats for the zero order model are updated.
  • The symbol is emitted from the new symbol model.
  • The model is updated to zero the weight of the symbol (since it will no longer go through this model again).
  • The zero order model is updated with the symbol.  Future processing of this symbol will now go through the zero order model.
The code that does this is:
            for (int i = 0; i < count; ++i)
            {
                byte b = buffer[offset + i];
                if (_newCharacterModel.Emitted(b))
                {
                    _coder.Encode(b, _adaptiveModel, _stream.Write);
                }
                else
                {
                    _coder.Encode(ZeroOrderAdaptiveByteModel.NewCharacter, _adaptiveModel, _stream.Write);
                    _coder.Encode(b, _newCharacterModel, _stream.Write);
                    _adaptiveModel.Update(b);
                }
            }
Which goes through the following extension method on the ArithmeticCoder class:
        /// Encodes <paramref name="symbol"/> using the <paramref name="coder"/> with
        /// the provided <paramref name="model"/>.
        /// </summary>
        public static void Encode<TSymbolType>(this ArithmeticCoder coder, TSymbolType symbol, IModel<TSymbolType> model,
                                               WriteBitDelegate bitWriter)
            where TSymbolType : struct
        {
            // cumulate frequencies
            Range count = model.GetRange(symbol);

            // encode symbol
            coder.Encode(count, model.TotalFrequencies, bitWriter);

            // update model
            model.Update(symbol);
        }
I have rearranged some of the logic as well to be more amenable to using vector math in a port and reduce some conditional statements in the e1/e2 loop.

I still have a number of unit tests to write.  I also want to see whether the lost range from the integer division makes a substantial difference to compression and how much work is involved to address it.  Finally a C port is in the queue to do.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.