Compression and Security

Site index

Compression and Security

Build your own one-on-one compressor

Adaptive dictionary-based compression

The discussion so far has focussed on the use of a static dictionary. However there is no reason why an adaptive system should not be used with the scheme.

Dynamic modifications to the dictionary may be permitted - provided they can be made at corresponding points in the compression and decompression processes.

In practice this means that a system which modifies the dictionary in ways which are based entirely on the decompressed text preceeding the current position will work.

Two main operations are required:

  • Adding new dictionary entries;
  • Deleting dictionary entries.

Deleting dictionary entries will help to reduce the space they take up if they recur in the future.

Adding dictionary entries during compression takes the form of verbatim copying of them into the output stream. Next the word is checked to make sure it does not clash with any existing dictionary entries. If it is a legal entry, it is next added to the dictionary, with a matching "short infrequent" symbol, taken from an ordered stack of these. If this stack is empty, infrequently used dictionary entries may be deleted and their symbols reused.

During decompression, if a non-dictionary word is spotted, which would have been added during the compression phase, it is again added to the dictionary.

Deleting entries whose "short, infrequent" corresponding entries are cropping up in the pre-compressed text helps when compressing data that does not conform well to the existing dictionary.

When compressing, deleting words, is a case of copying the corresponding expanded dictionary entry into the compressed plaintext *once*, and then removing the symbol. It's matching word may be recycled with a symbol from the stack of spare symbols, or with a symbol of an infrequently-used word.

When decompressing, if any expanded dictionary entries are found in the input stream, they are translated into their matching entries, and then excised from the dictionary, using the same operation applied when compressing.

The scheme described above destroys the symmetry between compression and decompression phases.


Site index | Compression and Security | Dictionary compression | One-on-one links | Site links


tim@tt1.org | http://mandala.co.uk/