Understanding the LZ77 compression algorithm


For more information and source code, you can go to my git repository:



  • Definition
  • Features
  • How it works
  • Binary representation
  • Usage
  • Pseudo-Code
  • Bonus – Position Counting


The LZ77 algorithm is a lossless compression algorithm using a sliding window.

It can be found under different names such as Lempel Ziv 77 or LZ1.

Lossless compression is a compression method that restores the same data as the original after decompression.
A sliding window compression uses a search memory area that can be moved, which prevents the algorithm from being too resource-intensive.

Operation of the sliding window


As we have seen previously, the LZ77 algorithm uses a sliding window like the majority of algorithms in the LZ family.

This sliding window is split into two parts, one part which is the buffer and another which is the dictionary.

The buffer is the read part and the dictionary is the search part.

The features of the algorithm

The purpose of this algorithm is to encode a recurrence found in the buffer and the dictionary by the position/size combination.

Encoding scheme

How it works

The operation is simple, find the largest pattern of the stamp in the dictionary.

To illustrate this, let’s take an example:

In the buffer = “AABBCC”.

In the dictionary = “RRYYIIAANNMMXX

We start looking:

  2. AABBC
  3. AABB
  4. AAB

The distance from the found pattern to the playback buffer is calculated.

The length of the found pattern is 2 (AA) and the next character of the pattern in the buffer is B so for encoding we get (8, 2, B).

If no pattern is found, the encoding will be like this (0, 0, [first character of the buffer]), then if no pattern is found in the previous example, the encoding will be (0, 0, A).

To summarize, if a pattern is found ([distance], [size], [next character]) and if no pattern is found (0, 0, [first character of the buffer]).

How the LZ77 works

On small data, the encoded part can be larger than the original data.
But on large data such as Victor Hugo’s Les Misérables (Volume 1):

  • Original file: 710 kilobytes
  • Compressed file: 420 kilobytes

Either 40% compression.

Compression ratio

Binary representation

For the binary representation, the commonly used values are:

  • Dictionary: 12 bits or 4095.
  • Buffer: 4 bits or 15.

Either encoded on 2 bytes.

Binary representation


The LZ77 is no longer used, the LZSS shape will be preferred which is a great improvement in compression ratio but also in compression/decompression speed.


Here is a pseudo-code for the compression part:

Pseudo-code compression

And the decompression part:

Pseudo-code decompression

Bonus – Position Counting

There are mainly 2 forms of position counting.

What I call absolute that is counted from the beginning of the dictionaryExample with the sentence: “I am here and I am superman” of 29.

                                | — 8 — |

The encoded sentence : `I am here and (0,8)superman`.

                                | — 0

And the relative format counts from the current index. Example with the sentence: “I’m here and I’m superman” of 29.

                                | — 8 — |

The encoded sentence : `I am here and (0,8)superman`.

                                | —– 15 —– |

In most uses, the relative mode is used.

This Post Has 3 Comments

  1. bursa escort

    Wow, marvelous blog layout! How long have you been blogging for? you make blogging look easy. The overall look of your website is fantastic, as well as the content!| Maris Izak Devina

  2. erotik

    I am sure this piece of writing has touched all the internet viewers, its really really pleasant piece of writing on building up new blog.| Adi Godart Marci

  3. atk young and hairy

    Hey there. I found your web site by way of Google whilst looking for a related topic, your website got here up. It appears good. I have bookmarked it in my google bookmarks to come back then. Julina Morlee Beverlie

Leave a Reply