This is the 26th day of my participation in the August Text Challenge.More challenges in August

What is a string

A string is what we call a string, which is also a linear list. One might think of a string as a linear list of characters, but that’s not accurate. For ordinary linear tables, they tend to focus on individual elements, with each individual element having its own meaning. For example, if we used a linear table to store class grades, the element type would be defined as follows:

typedef struct{
    char num[10];	/ / student id
    char name[10];	/ / name
    float scores;	/ / score
}
Copy the code

Suppose the table stores the following elements:

Student id The name score
1809111001 zack 97.5
1809111002 rudy 94
1809111003 alice 96
1809111004 atom 99

Now let’s take the data of two people with student numbers 01 and 04, and we’ll just say the results of the two people, or the top two people. Rather than calling them as a whole (as is often the case).

For strings, it’s a little different. For example, the following string:

Do not go gentle into that good night!
Copy the code

Let’s pull out some data:

gentle
Copy the code

We can call it a word. Let’s take out another part:

good night!
Copy the code

We can say it’s a sentence. Just because each part of the string has meaning as a whole, we need to implement operations on strings and patterns in the string. More on that later.

Second, string representation

Here we use a sequential storage structure to represent a string, similar to a sequence table:

#define MAXSIZE 20
typedef struct{
    char ch[MAXSIZE+1];
    int length;
}SString;
Copy the code

Here we create a char array of length MAXSIZE+1. We do not store data for elements with subscript 0, so that the logical location corresponds to the physical location. The other and order lists are the same.

In addition to the above representation, you can use the same representation as the C language itself. Instead of storing length information, we use the character \0 to indicate the end. But the algorithm to get the length of the string in this way is order n.

Three, string implementation

(1) String assignment

String assignment is very simple, just a simple loop operation:

void StringAssign(SString *S, char *str){
    int i = 0;
    // If the current character is not \0
    while(s[i] ! ='\ 0') {// Assign the contents of the character array to the string
        S->ch[i+1] = s[i]; ++S->length; ++i; }}Copy the code

Because the array subscripts start at 0, we assign s[I] to s ->ch[I +1].

(2) String replication

Copy is a simple loop similar to assignment, except that the assignment is changed to a string:

int StringCopy(SString *S1, SString S2){
    If the string is empty, 0 is returned
    if(! S2.length){return 0;
    }
    // Loop over S2, assigning the contents of S2 to S1 in turn
    for(int i = 1; i <= S2.length; i++){
        S1->ch[i] = S2.ch[i];
    }
    // Change the length of S1
    S1->length = S2.length;
    return 1;
}
Copy the code

We don’t need to worry about S1 as it was, just overwrite it and change the length of S1. So logically we’ve done the string assignment. Physically S1 might have other characters at the end, but we don’t care.

(3) Find the length

We return the length of the string directly:

int StringLength(SString S){
    return S.length;
}
Copy the code

(4) String comparison

A string comparison is a comparison of the ASCII values of individual characters:

int StringCompare(SString S1, SString S2){
    // Iterate over S1, comparing each character of S1 and S2 in turn
    for(int i = 1; i < S1.length; i++){
        // If not the same character
        if(S1.ch[i] ! = S2.ch[i]){// Return the difference between them
            returnS1.ch[i] - S2.ch[i]; }}// Return the difference in length
    return S1.length - S2.length;
}
Copy the code

In the loop, we determine if the characters are the same. If not, return the difference between S1 and S2. We determine the string by the first character that does not match. Like the following pairs:

abc    >    abd
acd    >    add
Copy the code

If each corresponding character is matched successfully, the length of the string is compared. This is the difference between the lengths, and if the two strings are the same, then this function will return 0. If S1 is “greater” than S2, the function returns something greater than 0, otherwise it returns something less than 0.

(5) string interception

A substring is a string of any consecutive characters in a string. For example, we have a string:

Do not go gentle into that good night!
Copy the code

The following are its substrings:

Do
 not
t go gentle
good
Copy the code

The substring must exist with the original string and must be continuous.

Intercepting a substring is simple, but here it is simply by subscripting:

int SubString(SString S1, SString *S2, int pos1, int pos2){
    // If the subscript is not reasonable, return 0
    if(pos1 < 1 || pos2 > S1.length || pos2 <= pos1){
        return 0;
    }
    // Assign the truncated contents of S1 to S2
    for(int i = pos1, j = 1; i <= pos2; i++, j++){
        S2->ch[j] = S1.ch[i];
    }
    Change the length of S2
    S2->length = pos2-pos1;
    return 1;
}
Copy the code

Let’s look at two separate operations, locating strings and pattern matching.

Locate string and pattern matching

The operation of locating a substring is to find the position where the substring first appears in the original string. For example, we have the following substrings:

Do not go gentle into that good night!
Do
not
nt
Copy the code

The Do position is 1, the not position is 4, and nt appears twice in the string, denoted by the first occurrence position, which is 13. Now let’s look at how to find strings.

(1) Locate the substring

The operation of locating the substring is the process of comparing the substring continuously. We start with the pointer I of the original string pointing to the beginning of the string and compare the substring from I to I +len (where Len is the length of the substring). As shown in figure:

The red box is the part taken out to compare with the substring. If equal to the substring, we return I as the position of the substring in the original string. If that fails, I ++ until I +len is greater than the length of the original string.

The code implementation is as follows:

int IndexSubString(SString S1, SString S2){
    // Used to store the truncated portion of the original string
    SString temp;
    InitString(&temp);
    // If the substring length is greater than
    if(S1.length < S2.length){
        return 0;
    }
    // Loop to compare primitives and substrings
    for(int i = 1; i+S2.length <= S1.length; i++){
        // Intercepts the original string
        SubString(S1, &temp, i, i+S2.length);
        // Compare the captured content to the substring
        int result = StringCompare(temp, S2);
        // Return the value of I (the position of the substring) if the truncated content is equal to the substring
        if(result == 0) {returni; }}return 0;
}
Copy the code

In general, the premise of locating a substring is that it must be found in the original string. This is the case where the substring doesn’t exist. The location we do when we are not sure whether a substring can be found in the original is called pattern matching. However, pattern matching also includes the matching of some special rules, so the meaning of pattern matching is richer.

(2) Pattern matching

Above algorithm we deliberately intercept a temporary string for comparison, this step is actually not necessary, here to make it easier to look at the code. The code without auxiliary strings looks like this:

int IndexSubString(SString S, SString T) {
    // refers to the first position of the substring being compared
    int k = 1;
    // Point to the position being compared in the original string and the position being compared in the pattern string
    int i = k, j = 1;
    // loop comparison
    while (k <= S.length && j <= T.length){
        if(S.ch[i] == T.ch[j]){
        	// If the current character matches successfully, the match continues
            i++;
            j++;
        }else{
            // if the current character fails to match, k points to the next substring and I is synchronized with k
            k++;
            i=k;
            j=1; }}// Prevent the string from running short
    if(j > T.length)
        return k;
    else
        return 0;
}
Copy the code

The three Pointers k, I and J point to positions as shown in the figure:

After the loop, we also judge whether j is greater than s2.length. Here we can simulate the process of matching the following string:

abaccdo
cdoo
Copy the code

When we match to the end, the loop exits normally. But the j pointer points to the first O in the pattern string, in which case our algorithm should return a failed match. That’s what the last if statement does.

To be precise, the above two algorithms are pattern matching algorithms. There is also an important KMP algorithm in the string. Due to space constraints, KMP algorithm will be written in a separate article.