KMP algorithm is a general algorithm for computer string matching. This wiki article describes the understanding of the KMP algorithm in an easy-to-understand way with simple examples.

Matching value table

For KMP, the “match value table” is critical. Let’s start with a simple example to describe how a match table is generated so you can understand.

Now the string we need to look for is “ABABABCA”.

Before describing “match tables”, we need a brief introduction to the concepts of prefixes and suffixes:

Prefix: a collection of strings of length from 0 to (len-1). Suffix: a collection of strings of length from 1 to (Len-1) in reverse order

string The prefix collection The suffix collection Intersection of prefix and suffix
“A” [] [] []
“AB” [A] [B] []
“ABA” [A,AB] [A, BA] [A]
“ABAB” [A, AB, ABA] [B, AB, BAB] [AB]
“ABABA” [A, AB, ABA, ABAB] [A, BA, ABA, BABA] [A, ABA]
“ABABAB” [A, AB, ABA, ABAB, ABABA] [B, AB, BAB, ABAB, BABAB] [AB, ABAB]
“ABABABC” [A, AB, ABA, ABAB, ABABA, ABABAB] [C, BC, ABC, BABC, ABABC, BABABC] []
“ABABABCA” [A, AB, ABA, ABAB, ABABA, ABABAB, ABABABC] [A, CA, BCA, ABCA, BABCA, ABABCA, BABABCA] [A]

From the table above, the concept of prefixes and suffixes can be understood if you are patient.

So what does “match value” mean?

“Match value” refers to the set of prefixes and suffixes, the length of the longest common element, that is, the length of the longest element in the intersection

Then it is not difficult to find from the above table that each character (index) corresponds to “value” :

char: | A | B | A | B | A | B | C | A |
index:| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
value:| 0 | 0 | 1 | 2 | 3 | 4 | 0 | 1 |
Copy the code

Use of matching value tables

We can speed up the process of finding matches based on a table of match values.

Here are some examples:

Look for the above string “ABABABCA”(pattern) in the string “bacbababaabcbababca “(text).

The first successful match is as follows:

BACBABABAABCBABABABCA
 |
 ABABABCA
Copy the code

It is found that the second “C” of text does not match the second “B” of pattern, so the current partial matching length is 1(there is only one A), and the current matching value is 0 according to the above matching value table.

Move bit = Length of matched characters - Matching value of the corresponding bit

So the number of moves is equal to 1 minus 0, so we’re going to move one bit further back to match.

If the match is successful again:

BACBABABAABCBABABABCA
    |||||
    ABABABCA
Copy the code

At this point, “A” in the text does not match “B” in the pattern. If the algorithm is not followed, it must continue to match one bit later. According to the above calculation formula:

Moving digit = “ABABA”. The matching value of length-pattern [4] is 5-3 = 2

So we can move back two places at a time:

BACBABABAABCBABABABCA
    xx|||
      ABABABCA
Copy the code

The matching value of length-pattern [2] is 3-1 = 2

Continue to move two places:

BACBABABAABCBABABABCA
      xx|
        ABABABCA
Copy the code

Continue with “A”. The matching value of length-pattern [0] is 1-0 = 1

Move back one:

BACBABABAABCBABABABCA
        x||
         ABABABCA
Copy the code

Continue with “AB”. The matching value of length-pattern [1] is 2-0 = 2

Move back two places:

BACBABABAABCBABABABCA
         xx|
           ABABABCA
Copy the code

The first one doesn’t match, so we keep moving back until we get a match

BACBABABAABCBABABABCA
             ||||||||
             ABABABCA
Copy the code

After a few moves (step=1), the final match is found.

Reference: jakeboxer.com/blog/2009/1…