This is the fourth day of my participation in the August More text Challenge. For details, see:August is more challenging

Pattern matching algorithm of data structure series

String pattern matching

Pattern Matching: Substring positioning (lndex (S; T) function)

  • Algorithm purpose

Determines the first occurrence of the substring T contained in the main string S

  • The initial conditions

The strings S and T exist, and T is a non-empty string

  • Operating results

If S has a substring with the same value as T, return its first occurrence in the main string, otherwise return 0.

1. Classical pattern matching algorithm (BF algorithm)

BF algorithm design idea

  • 1. Compare the ith (initial I =l) character of main string S with the first character of mode T. If they are equal, continue to compare subsequent characters one by one. If not, start from the next character (i++) of the main string S, and re-compare it with the first character of T.
  • 2. A sequence of consecutive substring characters up to the main string S is equal to the pattern T. The return value is the ordinal number of the first character in the subsequence of S that matches T, that is, a successful match. Otherwise, the match fails and the value 0 is returned.

#include<iostream>
#include<stdlib.h>
using namespace std;
//1. Variable length storage of string
typedef struct HString{
	char *ch;
	int length;
}HString; 

/ / 2. BF algorithm
int BruteForce(HString str,HString substr){
	int i=1,j=1,k=i;
	while(i<=str.length&&j<=substr.length){
		if(str.ch[i] == substr.ch[j]){
			i++;
			j++;
		}
		else{
			j = 1; i = ++k; }}if(j>substr.length)
		return k;
	else
		return 0; 
}


int main(a){
	return 0;
} 
Copy the code

The classical pattern matching algorithm (BF algorithm) sets the length of the main string m and the length of the substring n, then the time complexity of the worst case is O((n-L) * (m-n))=O(n*m). Disadvantages: Every time when there is a mismatch, the pointer of the main string I and the pointer of the substring J have to go back, making the algorithm inefficient.

2. The KMP algorithm

  • Feature: When the selection does not match, the main string pointer I does not fall back. Only the substring j pointer is rolled back to the specified position, which is not necessarily the first position of the substring.
  • The core of the KMP algorithm is to calculate the position to fall back to when there is a mismatch at the J pointer
  • Use the next[] array to store the required location information, where next[k] stores the position that J should fall back to if a mismatch occurs at j=k.
  • Solving for the next[] array is string dependent, not main string dependent

How to solve the next[] array?

Manual solution idea:

  • Next [l]=0 The first character in the table fails to match, indicating that the main string I is shifted back
  • Next [2]=1 The second character in the table fails to match. In this case, the substring j can only go back to the beginning of the string
  • The value of next[j] is the string length of FL or FR +1
  • FL or FR should be as large as possible
  • Solving next[] is actually looking for the maximum common element length for prefixes and suffixes

Algorithm idea:

  • Problem description:

How to find next[k +1]?

  • If T = T [j], [k] the next [k + l] = next [k] + l = j + l, then k++ j++.
  • If T T [j], [k] indicates the fallback j to the next [j], if j = O, is to determine T T [j], [k] and if j = 0, the next [k + l] = 1;
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
#define MaxSize 100
/* Construct the match array */
void BuildMatch(char *pattern,int *match)
{
  int n = strlen(pattern);
  int dis = 1,s = 0;
  while (s+dis < n)
  {
    if(pattern[s] == pattern[s+dis])
    {
      match[s+dis] = match[s+dis- 1] +1;
      s++;
    }
    else
    {s = 0;dis++;}
  }
}
/*KMP*/
int KMP(char *str,char *pattern)
{
  int n = strlen(str);
  int m = strlen(pattern);
  int s = 0,p = 0;
  int *match = (int*)malloc(sizeof(int)*m);
 
  memset(match,- 1.sizeof(int)*n);
  BuildMatch(pattern,match);
 
  while (s < n && p < m)
  {
    if(str[s] == pattern[p]){s++; p++; }else if(p > 0)p = match[p- 1] +1;
    else s++;
  }
  return p == m ? s-m+1: - 1;
}
int main(int argc, char const *argv[]) {
  char str[MaxSize],pattern[MaxSize];
  printf("Please enter a string :\n");
  scanf("%s",str);
  printf("Please enter a substring :\n");
  scanf("%s",pattern);
 
  int result = KMP(str,pattern);
  if(result == - 1)printf("Search failed \n");
  else printf('%d\n',result);
  return 0;
}
Copy the code