Foreword
For more information and source code, you can go to my git repository:
https://github.com/fbonhomm/LZ77
Summary
- Definition
- Features
- How it works
- Binary representation
- Usage
- Pseudo-Code
- Bonus – Position Counting
Definition
The LZ77 algorithm is a lossless compression algorithm using a sliding window.
It can be found under different names such as Lempel Ziv 77 or LZ1.
Lossless compression is a compression method that restores the same data as the original after decompression.
A sliding window compression uses a search memory area that can be moved, which prevents the algorithm from being too resource-intensive.
Features
As we have seen previously, the LZ77 algorithm uses a sliding window like the majority of algorithms in the LZ family.
This sliding window is split into two parts, one part which is the buffer and another which is the dictionary.
The buffer is the read part and the dictionary is the search part.
The purpose of this algorithm is to encode a recurrence found in the buffer and the dictionary by the position/size combination.
How it works
The operation is simple, find the largest pattern of the stamp in the dictionary.
To illustrate this, let’s take an example:
In the buffer = “AABBCC”.
In the dictionary = “RRYYIIAANNMMXX
We start looking:
- AABBCC
NOT FOUND - AABBC
NOT FOUND - AABB
NOT FOUND - AAB
NOT FOUNDAA
FOUND RRYYIIAANNMMXX
The distance from the found pattern to the playback buffer is calculated.
“RRYYIIAANNMMXX” = 8
The length of the found pattern is 2 (AA) and the next character of the pattern in the buffer is B so for encoding we get (8, 2, B).
If no pattern is found, the encoding will be like this (0, 0, [first character of the buffer]), then if no pattern is found in the previous example, the encoding will be (0, 0, A).
To summarize, if a pattern is found ([distance], [size], [next character]) and if no pattern is found (0, 0, [first character of the buffer]).
On small data, the encoded part can be larger than the original data.
But on large data such as Victor Hugo’s Les Misérables (Volume 1):
- Original file: 710 kilobytes
- Compressed file: 420 kilobytes
Either 40% compression.
Binary representation
For the binary representation, the commonly used values are:
- Dictionary: 12 bits or 4095.
- Buffer: 4 bits or 15.
Either encoded on 2 bytes.
Usage
The LZ77 is no longer used, the LZSS shape will be preferred which is a great improvement in compression ratio but also in compression/decompression speed.
Pseudo-Code
Here is a pseudo-code for the compression part:
And the decompression part:
Bonus – Position Counting
There are mainly 2 forms of position counting.
What I call absolute that is counted from the beginning of the dictionaryExample with the sentence: “I am here and I am superman” of 29.
| — 8 — |
The encoded sentence : `I am here and (0,8)superman`.
| — 0
And the relative format counts from the current index. Example with the sentence: “I’m here and I’m superman” of 29.
| — 8 — |
The encoded sentence : `I am here and (0,8)superman`.
| —– 15 —– |
In most uses, the relative mode is used.
Wow, marvelous blog layout! How long have you been blogging for? you make blogging look easy. The overall look of your website is fantastic, as well as the content!| Maris Izak Devina
I am sure this piece of writing has touched all the internet viewers, its really really pleasant piece of writing on building up new blog.| Adi Godart Marci
Hey there. I found your web site by way of Google whilst looking for a related topic, your website got here up. It appears good. I have bookmarked it in my google bookmarks to come back then. Julina Morlee Beverlie