Understanding the Knuth-Morris-Pratt algorithm

You can find a source code at this address:

The Knuth-Morris-Pratt or KMP algorithm is a substring search algorithm.

It makes a pre-processing on the substring by creating an array with the jumps to be made to avoid useless research.

The algorithm is divided in 2 parts, a phase of construction of the jump table and a phase of research.

Construction of the jump table [Code table].

For participate in parachute (24), the table is -1 0 0 0 0 0 -1 0 2 0 0 0 0 0 -1 0 0 3 0 0 0 0 0 0 (25)

The first letter is p, so at each p, we enter -1 in the table.

The second letter is a, so at each pa, we write -1 0 in the table, which is the size of the pattern.

The third letter is r, so at each par, we write -1 0 0 in the table, which is the size of the pattern.

We put 0 for each combination ended by the length or the first combination as:

  • -1 0 0 0 0 0 0 -1 == particip (first combination partici)
  • -1 0 0 0 0 0 0 -1 0 2 == participat (the 2 means the 2 before the t)

For par we write -1 0 0, if the next one is t then -1 0 0 0 otherwise -1 0 0 3 (except the beginning of course)

Then:

  • -1 0 0 0 0 0 0 == partici
  • -1 0 2 == pat
  • 0 0 0 0 0 == e in
  • -1 0 0 3 == para
  • 0 0 0 0 0 == fall
  • 0 the last character is used to write a combination if there is one at the end, so the array is the size of the string + 1

Substring search [Search Code]

The search is very simple, when it is equal to the increment the index of the string and the index of the substring.

When it is not equal, we get the value of the array with the index of the substring, this value becomes the index of the substring.

If the index of the substring is -1, we increment the 2 indexes otherwise we do nothing.

With the naive method, we are at 39 steps.

With the KMP method, we are at 26 steps.

Leave a Reply