# One's deceased father grind the data structure of string pattern matching algorithm | BF and KMP algorithm analysis and code implementation

Posted on Dec. 3, 2022, 8:04 a.m. by Miss Hilary Rogers
Category: The back-end

This is the fourth day of my participation in the August More text Challenge. For details, see:August is more challenging

### Pattern matching algorithm of data structure series

#### String pattern matching

Pattern Matching: Substring positioning (lndex (S; T) function)

• Algorithm purpose

Determines the first occurrence of the substring T contained in the main string S

• The initial conditions

The strings S and T exist, and T is a non-empty string

• Operating results

If S has a substring with the same value as T, return its first occurrence in the main string, otherwise return 0.

#### 1. Classical pattern matching algorithm (BF algorithm)

BF algorithm design idea

• 1. Compare the ith (initial I =l) character of main string S with the first character of mode T. If they are equal, continue to compare subsequent characters one by one. If not, start from the next character (i++) of the main string S, and re-compare it with the first character of T.
• 2. A sequence of consecutive substring characters up to the main string S is equal to the pattern T. The return value is the ordinal number of the first character in the subsequence of S that matches T, that is, a successful match. Otherwise, the match fails and the value 0 is returned.

``````#includeiostream
#includestdlib.h
using namespace std;
//1. Variable length storage of string
typedef struct HString{
char *ch;
int length;
}HString;

/ / 2. BF algorithm
int BruteForce(HString str,HString substr){
int i=1,j=1,k=i;
while(i=str.lengthj=substr.length){
if(str.ch[i] == substr.ch[j]){
i++;
j++;
}
else{
j = 1; i = ++k; }}if(jsubstr.length)
return k;
else
return 0;
}

int main(a){
return 0;
}
Copy the code``````

The classical pattern matching algorithm (BF algorithm) sets the length of the main string m and the length of the substring n, then the time complexity of the worst case is O((n-L) * (m-n))=O(n*m). Disadvantages: Every time when there is a mismatch, the pointer of the main string I and the pointer of the substring J have to go back, making the algorithm inefficient.

#### 2. The KMP algorithm

• Feature: When the selection does not match, the main string pointer I does not fall back. Only the substring j pointer is rolled back to the specified position, which is not necessarily the first position of the substring.
• The core of the KMP algorithm is to calculate the position to fall back to when there is a mismatch at the J pointer
• Use the next[] array to store the required location information, where next[k] stores the position that J should fall back to if a mismatch occurs at j=k.
• Solving for the next[] array is string dependent, not main string dependent

#### How to solve the next[] array?

Manual solution idea:

• Next [l]=0 The first character in the table fails to match, indicating that the main string I is shifted back
• Next [2]=1 The second character in the table fails to match. In this case, the substring j can only go back to the beginning of the string
• The value of next[j] is the string length of FL or FR +1
• FL or FR should be as large as possible
• Solving next[] is actually looking for the maximum common element length for prefixes and suffixes

Algorithm idea:

• Problem description:

How to find next[k +1]?

• If T = T [j], [k] the next [k + l] = next [k] + l = j + l, then k++ j++.
• If T T [j], [k] indicates the fallback j to the next [j], if j = O, is to determine T T [j], [k] and if j = 0, the next [k + l] = 1;
``````#include stdio.h
#include stdlib.h
#include string.h

#define MaxSize 100
/* Construct the match array */
void BuildMatch(char *pattern,int *match)
{
int n = strlen(pattern);
int dis = 1,s = 0;
while (s+dis  n)
{
if(pattern[s] == pattern[s+dis])
{
match[s+dis] = match[s+dis- 1] +1;
s++;
}
else
{s = 0;dis++;}
}
}
/*KMP*/
int KMP(char *str,char *pattern)
{
int n = strlen(str);
int m = strlen(pattern);
int s = 0,p = 0;
int *match = (int*)malloc(sizeof(int)*m);

memset(match,- 1.sizeof(int)*n);
BuildMatch(pattern,match);

while (s  n  p  m)
{
if(str[s] == pattern[p]){s++; p++; }else if(p  0)p = match[p- 1] +1;
else s++;
}
return p == m ? s-m+1: - 1;
}
int main(int argc, char const *argv[]) {
char str[MaxSize],pattern[MaxSize];