BM algorithm

BM algorithm can realize editor search function.

The matching rules of BM algorithm are divided into bad characters and good suffixes. Suppose you look for EXAMPLE in HERE IS A SIMPLE EXAMPLE

Bad character principle

  1. Matches from back to front. The S in the main string does not match the E in the pattern string, and the S is called a bad character
  2. If a bad character is matched, mark the pattern string where the bad character appears, at position E, 6(starting from 0)
  3. Searches the pattern string to see if the bad character S comes first. If so, mark the right-most bad character position; if not, mark the bad character position -1
  4. As in the above example, S does not appear in the pattern string, then the following formula: move position = the position where the bad character appears – the position in the pattern string where the bad character appears again. In the above example, move position = 6 – (-1), that is, move 7 bits. Since the bad character appears at bit 6 of the pattern string, S does not appear again, so the position for the bad character to appear again is -1
  5. Rule of bad characters: If a bad character is encountered, two positions are needed to calculate the movement position. The first position is the position where the bad character appears, and the second position is the position to the right where the bad character appears again. If the bad character does not appear again, it is -1

Good suffixes principle

  1. In this case, the match is MPLE, which is called a good suffix
  2. Good suffixes here are MPLE, PLE, LE, E. Four good suffixes.
  3. The longest suffix is MPLE. EXA does not have the longest suffix. If so, move position = the index of the last position of the longest good suffix minus the index of the last position of the previously matched substring. It’s too complicated. Look at the example
  4. For example, assume that the good suffix is AB, and there is a good suffix AB in the preceding ABD. Take the position of the last character of the suffix, that is, the position of B in the good suffix, that is, 4. Then take the position of the last character of the good suffix, such as B in ABD, that is, 1. Move position = 4-1, move 3 places.
  5. For EXAMPLE, there is no MPLE in “EXA”. If “EXA” is “E”, then move position = the last index of the longest good suffix – the longest good suffix matched before
  6. Moving position is equal to 6 minus 0 is equal to 6
  7. If there are no good suffixes in front of it, then the last position before the mark of good suffixes will be -1, in fact, it will directly move the pattern string to the next place of good suffixes

Move position to take the maximum value of bad characters and good suffixes

KMP algorithm

  1. Match from the beginning, as in the figure above, B and A do not match, move back one place, until the match



2. The last digit of the space does not match the DUse matched ABCDAB.

3. Create onePartial match table



The search term is the character that has already been matched.

The partial match isPrefix and suffix are the same element length

Calculation method: A -> AB -> ABC up to ABCDABD

  • The prefix and suffix of [A] are empty, and the partial matching value is 0
  • The prefix of [AB] is A, and the suffix is B. If no match is found, the partial match value is 0
  • The prefix of [ABC] is A, AB, and the suffix is B, BC. If no match is found, the partial match value is 0
  • .
  • [ABCDA] The prefix is A, AB, ABC, ABCD, and the suffix is A, DA, CDA, and BCDA. The prefix is the same as the suffix A, and the length of the prefix is 1, so the partial match value is 1
  • [ABCDAB] …. The partial match value is 2

Move position = length of matched string – move position of last partial match value = 6-2 = 4

After 4 moves, AB successfully matched, but C and space did not match. Move position = 2 (matched length) -0 (last partial matched value) = 2 and so on until found.

The partial matching table above is not a next array. Now let’s see how the next array is evaluated based on the partial matching table

The next array stores where the index in the pattern string should go if a string match fails

[0, 1, 1, 1, 1, 2, 3]

I is the index in the main string

J is the index in the pattern string

  1. If the first match fails, both the main string and the pattern string should be moved one bit later, judged by 0
  2. In the second place, which is B, according to the above formula, mobile location = 1 (matched length) – 0 (before A part of bad character matching values, A part of the match value), as we know, string mode should be backwards one, after the move, which is A position before came to B, then A index? Next [1] = 1; next[1] = 1; next[1] = 1;
  3. Move position = 2(matched length) -0 (partial match value before bad character, that is, partial match value for B), move 2 bits, pattern string moves back 2 bits, A comes again, so next[3] = 1
  4. .
  5. .
  6. 6 bit B failed to match, move position = 5(matched length) -1 (partial match value of previous bit), move 4 bits, change B to align with bad character, at this time pattern string index is 2, so next[5] = 2
  7. Move position = 6-2 = 4, become C-aligned,next[6] = 3

Small white learning KMP, before is also hard to understand, I hope to help you, if there is a mistake, please help correct, thank ~!


Ruan yifeng big brother’s KMP algorithm Ruan Yifeng big brother’s BM algorithm