Arithmetic Coding

I have updated my github project with support for arithmetic coding. It uses the algorithm provided by Malte Clasen and Eric Bodden. It is an integer based encoder (32 bit unsigned).

I have made some changes to the original implementation to separate the statistical models more fully from the coder. This allows substituting models on a per symbol basis.

An example of this behavior is provided in the ArithmeticStream class (paralleling the compression classes in System.IO.Compression). This class uses two models: a zero order model and a new symbol model. The former is only initialized with two symbols (stream terminator and new character). The latter is initialized with all characters.

When a new symbol is encountered to compress the following steps occur:

The NewCharacter symbol from the zero order model is emitted.
The stats for the zero order model are updated.
The symbol is emitted from the new symbol model.
The model is updated to zero the weight of the symbol (since it will no longer go through this model again).
The zero order model is updated with the symbol. Future processing of this symbol will now go through the zero order model.

The code that does this is:

            for (int i = 0; i < count; ++i)
            {
                byte b = buffer[offset + i];
                if (_newCharacterModel.Emitted(b))
                {
                    _coder.Encode(b, _adaptiveModel, _stream.Write);
                }
                else
                {
                    _coder.Encode(ZeroOrderAdaptiveByteModel.NewCharacter, _adaptiveModel, _stream.Write);
                    _coder.Encode(b, _newCharacterModel, _stream.Write);
                    _adaptiveModel.Update(b);
                }
            }

Which goes through the following extension method on the ArithmeticCoder class:

        /// Encodes <paramref name="symbol"/> using the <paramref name="coder"/> with
        /// the provided <paramref name="model"/>.
        /// </summary>
        public static void Encode<TSymbolType>(this ArithmeticCoder coder, TSymbolType symbol, IModel<TSymbolType> model,
                                               WriteBitDelegate bitWriter)
            where TSymbolType : struct
        {
            // cumulate frequencies
            Range count = model.GetRange(symbol);

            // encode symbol
            coder.Encode(count, model.TotalFrequencies, bitWriter);

            // update model
            model.Update(symbol);
        }

I have rearranged some of the logic as well to be more amenable to using vector math in a port and reduce some conditional statements in the e1/e2 loop.

I still have a number of unit tests to write. I also want to see whether the lost range from the integer division makes a substantial difference to compression and how much work is involved to address it. Finally a C port is in the queue to do.

Brent Scriver

Where I scribble something other than code.

Leave a Reply Cancel reply