Compression and Security

Site index

Compression and Security

Build your own one-on-one compressor

Reverse Parsing - a sister scheme

The discussion here has considered parsing the file from top to bottom, and looking up words in the dictionary starting with their first letter and proceeding towards their last. In other wiords the direction of motion through the file and the direction of motion through words in the dictionary have both been "forwards".

This is not the only possible parsing scheme. An alternative where these two vectors point in opposite direction produces a different algorithm, with different constraints on the dictionary. However, it appears that the end result will be closely equivalent.

The algorithm will be presented anyway. It's alternative perspective may throw light on the workings of the technique, or provide other insights.

Start at the end of the file, work backwards, one character at a time, until either:
  • The start of the file is reached, or...
  • It is detected that the current character is the first letter of a dictionary entry.

In the first case, prepend these letters to the start of the output file, and stop.

In the second case, prepend a string corresponding to the matching dictionary entry for the symbol found, followed by any characters after the end of the dictionary entry found and before the end of the input file.

After this has been done, truncate the input file at the start of the dictionary entry just located and repeat - until the start of the input file has been located.

When using this algorithm, the "no substrings" condition (presented in the document about static dictionary compression) is no longer required.

However, in its place another condition must be used. This is that if one string in the dictionary is a prefix of another one, the shorter of the two strings must always be used.

The "leading strings never match trailing strings" condition (also presented in the document about static dictionary compression) must, however, remain in force.

Eliminating the "no substrings condition" means that less time need be spent "cleaning" the dictionary. On the other hand, any entries remaining in the dictionary which contain other entries as partial substrings wind up never being used - and are just dead flesh.

With a properly cleaned an efficient dictionary, the two schemes appear to be equivalent to one another. One remaining question is whether one approach may be implemened with greater computational efficiency than the other.


Site index | Compression and Security | Dictionary compression | One-on-one links | Site links


tim@tt1.org | http://mandala.co.uk/